I wrote a PLINQ query that ends with the ForAll operator, and I used the WithCancellation operator in order to cancel the query midway. Surprisingly the query is not canceled. Here is a minimal demonstration of this behavior:
CancellationTokenSource cts = new CancellationTokenSource(1000);
cts.Token.Register(() => Console.WriteLine("--Token Canceled"));
try
{
Enumerable.Range(1, 20)
.AsParallel()
.WithDegreeOfParallelism(2)
.WithCancellation(cts.Token)
.ForAll(x =>
{
Console.WriteLine($"Processing item #{x}");
Thread.Sleep(200);
//cts.Token.ThrowIfCancellationRequested();
});
Console.WriteLine($"The query was completed successfully");
}
catch (OperationCanceledException)
{
Console.WriteLine($"The query was canceled");
}
Online demo.
Output (undesirable):
Processing item #1
Processing item #2
Processing item #4
Processing item #3
Processing item #5
Processing item #6
Processing item #8
Processing item #7
Processing item #10
Processing item #9
--Token Canceled
Processing item #11
Processing item #12
Processing item #13
Processing item #14
Processing item #15
Processing item #16
Processing item #17
Processing item #19
Processing item #20
Processing item #18
The query was canceled
The query completes with an OperationCanceledException, but not before processing all 20 items. The desirable behavior emerges when I uncomment the cts.Token.ThrowIfCancellationRequested(); line.
Output (desirable):
Processing item #2
Processing item #1
Processing item #3
Processing item #4
Processing item #5
Processing item #6
Processing item #7
Processing item #8
Processing item #9
Processing item #10
--Token Canceled
The query was canceled
Am I doing something wrong, or this is the by-design behavior of the ForAll+WithCancellation combination? Or it's a bug in the PLINQ library?
It seems to be by design, but the logic is a bit different than you might expect. If we dig into source code a bit, we'll find related piece of ForAll implementation here:
while (_source.MoveNext(ref element, ref keyUnused))
{
if ((i++ & CancellationState.POLL_INTERVAL) == 0)
_cancellationToken.ThrowIfCancellationRequested();
_elementAction(element);
}
So it does check for cancellation but not every iteration. If we check CancellationState.POLL_INTERVAL:
/// <summary>
/// Poll frequency (number of loops per cancellation check) for situations where per-1-loop testing is too high an overhead.
/// </summary>
internal const int POLL_INTERVAL = 63; //must be of the form (2^n)-1.
// The two main situations requiring POLL_INTERVAL are:
// 1. inner loops of sorting/merging operations
// 2. tight loops that perform very little work per MoveNext call.
// Testing has shown both situations have similar requirements and can share the same constant for polling interval.
//
// Because the poll checks are per-N loops, if there are delays in user code, they may affect cancellation timeliness.
// Guidance is that all user-delegates should perform cancellation checks at least every 1ms.
//
// Inner loop code should poll once per n loop, typically via:
// if ((i++ & CancellationState.POLL_INTERVAL) == 0)
// _cancellationToken.ThrowIfCancellationRequested();
// (Note, this only behaves as expected if FREQ is of the form (2^n)-1
So basically PLINQ developers assume that you have a very fast code inside ForAll (and similar methods), and as such they consider it wasteful to check for cancellation every iteration, so they check every 64 iterations. If you have long running code - you can check for cancellation yourself. I guess they had to do it like this because they can't do right thing for all situations in this case, however IF they checked every iteration - you would not be able to avoid the perfomance cost.
If you increase number of iterations in your code and adjust cancellation timeout - you'll see that indeed it will cancel after about 64 iterations (on each partition, so 128 total).
Evk's answer explains thoroughly the observed behavior: the PLINQ operators check the cancellation token periodically, and not for each processed item. I searched for a way to alter this behavior, and I think that I found one. When the parallel query is enumerated with a foreach loop, the cancellation token is checked on each iteration. So here is the solution that I came up with:
/// <summary>
/// Invokes in parallel the specified action for each element in the source,
/// checking the associated CancellationToken before invoking the action.
/// </summary>
public static void ForAll2<TSource>(this ParallelQuery<TSource> source,
Action<TSource> action)
{
foreach (var _ in source.Select(item => { action(item); return 0; })) { }
}
The Select operator projects the ParallelQuery<TSource> to a ParallelQuery<int> with zero values, which is then enumerated with an empty foreach loop. The action is invoked in parallel as a side-effect of the enumeration.
Online demo.
Related
I'm writing a batching pipeline that processes X outstanding operations every Y seconds. It feels like System.Reactive would be a good fit for this, but I'm not able to get the subscriber to execute in parallel. My code looks like this:
var subject = new Subject<int>();
var concurrentCount = 0;
using var reader = subject
.Buffer(TimeSpan.FromSeconds(1), 100)
.Subscribe(list =>
{
var c = Interlocked.Increment(ref concurrentCount);
if (c > 1) Console.WriteLine("Executing {0} simultaneous batches", c); // This never gets printed, because Subscribe is only ever called on a single thread.
Interlocked.Decrement(ref concurrentCount);
});
Parallel.For(0, 1_000_000, i =>
{
subject.OnNext(i);
});
subject.OnCompleted();
Is there an elegant way to read from this buffered Subject, in a concurrent manner?
The Rx subscription code is always synchronous¹. What you need to do is to remove the processing code from the Subscribe delegate, and make it a side-effect of the observable sequence. Here is how it can be done:
Subject<int> subject = new();
int concurrentCount = 0;
Task processor = subject
.Buffer(TimeSpan.FromSeconds(1), 100)
.Select(list => Observable.Defer(() => Observable.Start(() =>
{
int c = Interlocked.Increment(ref concurrentCount);
if (c > 1) Console.WriteLine($"Executing {c} simultaneous batches");
Interlocked.Decrement(ref concurrentCount);
})))
.Merge(maxConcurrent: 2)
.DefaultIfEmpty() // Prevents exception in corner case (empty source)
.ToTask(); // or RunAsync (either one starts the processor)
for (int i = 0; i < 1_000_000; i++)
{
subject.OnNext(i);
}
subject.OnCompleted();
processor.Wait();
The Select+Observable.Defer+Observable.Start combination converts the source sequence to an IObservable<IObservable<Unit>>. It's a nested sequence, with each inner sequence representing the processing of one list. When the delegate of the Observable.Start completes, the inner sequence emits a Unit value and then completes. The wrapping Defer operator ensures that the inner sequences are "cold", so that they are not started before they are subscribed. Then follows the Merge operator, which unwraps the outer sequence to a flat IObservable<Unit> sequence. The maxConcurrent parameter configures how many of the inner sequences will be subscribed concurrently. Every time an inner sequence is subscribed by the Merge operator, the corresponding Observable.Start delegate starts running on a ThreadPool thread.
If you set the maxConcurrent too high, the ThreadPool may run out of workers (in other words it may become saturated), and
the concurrency of your code will then become dependent on the ThreadPool availability. If you wish, you can increase the number of workers that the ThreadPool creates instantly on demand, by using the ThreadPool.SetMinThreads method. But if your workload is CPU-bound, and you increase the worker threads above the Environment.ProcessorCount value, then most probably your CPU will be saturated instead.
If your workload is asynchronous, you can replace the Observable.Defer+Observable.Start combo with the Observable.FromAsync operator, as shown here.
¹ An unpublished library exists, the AsyncRx.NET, that plays with the idea of asynchronous subscriptions. It is based on the new interfaces IAsyncObservable<T> and IAsyncObserver<T>.
You say this:
// This never gets printed, because Subscribe is only ever called on a single thread.
It's just not true. The reason nothing gets printed is because the code in the Subscribe happens in a locked manner - only one thread at a time executes in a Subscribe so you are incrementing the value and then decrementing it almost immediately. And since it starts at zero it never has a chance to rise above 1.
Now that's just because of the Rx contract. Only one thread in subscribe at once.
We can fix that.
Try this code:
using var reader = subject
.Buffer(TimeSpan.FromSeconds(1), 100)
.SelectMany(list =>
Observable
.Start(() =>
{
var c = Interlocked.Increment(ref concurrentCount);
Console.WriteLine("Starting {0} simultaneous batches", c);
})
.Finally(() =>
{
var c = Interlocked.Decrement(ref concurrentCount);
Console.WriteLine("Ending {0} simultaneous batches", c);
}))
.Subscribe();
Now when I run it (with less than the 1_000_000 iterations that you set) I get output like this:
Starting 1 simultaneous batches
Starting 4 simultaneous batches
Ending 3 simultaneous batches
Ending 2 simultaneous batches
Starting 3 simultaneous batches
Starting 3 simultaneous batches
Ending 1 simultaneous batches
Ending 2 simultaneous batches
Starting 4 simultaneous batches
Starting 5 simultaneous batches
Ending 3 simultaneous batches
Starting 2 simultaneous batches
Starting 2 simultaneous batches
Ending 2 simultaneous batches
Starting 3 simultaneous batches
Ending 0 simultaneous batches
Ending 4 simultaneous batches
Ending 1 simultaneous batches
Starting 1 simultaneous batches
Starting 1 simultaneous batches
Ending 0 simultaneous batches
Ending 0 simultaneous batches
I am evaluating the Polly library in terms of features and flexibility, and as part of the evaluation process I am trying to combine the WaitAndRetryPolicy with the BulkheadPolicy policies, to achieve a combination of resiliency and throttling. The problem is that the resulting behavior of this combination does not match my expectations and preferences. What I would like is to prioritize the retrying of failed operations over executing fresh/unprocessed operations.
The rationale is that (from my experience) a failed operation has greater chances of failing again. So if all failed operations get pushed to the end of the whole process, that last part of the whole process will be painfully slow and unproductive. Not only because these operations may fail again, but also because of the required delay between each retry, that may need to be progressively longer after each failed attempt. So what I want is that each time the BulkheadPolicy has room for starting a new operation, to choose a retry operation if there is one in its queue.
Here is an example that demonstrates the undesirable behavior I would like to fix. 10 items need to be processed. All fail on their first attempt and succeed on their second attempt, resulting to a total of 20 executions. The waiting period before retrying an item is one second. Only 2 operations should be active at any moment:
var policy = Policy.WrapAsync
(
Policy
.Handle<HttpRequestException>()
.WaitAndRetryAsync(retryCount: 1, _ => TimeSpan.FromSeconds(1)),
Policy.BulkheadAsync(
maxParallelization: 2, maxQueuingActions: Int32.MaxValue)
);
var tasks = new List<Task>();
foreach (var item in Enumerable.Range(1, 10))
{
int attempt = 0;
tasks.Add(policy.ExecuteAsync(async () =>
{
attempt++;
Console.WriteLine($"{DateTime.Now:HH:mm:ss} Starting #{item}/{attempt}");
await Task.Delay(1000);
if (attempt == 1) throw new HttpRequestException();
}));
}
await Task.WhenAll(tasks);
Output (actual):
09:07:12 Starting #1/1
09:07:12 Starting #2/1
09:07:13 Starting #3/1
09:07:13 Starting #4/1
09:07:14 Starting #5/1
09:07:14 Starting #6/1
09:07:15 Starting #8/1
09:07:15 Starting #7/1
09:07:16 Starting #10/1
09:07:16 Starting #9/1
09:07:17 Starting #2/2
09:07:17 Starting #1/2
09:07:18 Starting #4/2
09:07:18 Starting #3/2
09:07:19 Starting #5/2
09:07:19 Starting #6/2
09:07:20 Starting #7/2
09:07:20 Starting #8/2
09:07:21 Starting #10/2
09:07:21 Starting #9/2
The expected output should be something like this (I wrote it by hand):
09:07:12 Starting #1/1
09:07:12 Starting #2/1
09:07:13 Starting #3/1
09:07:13 Starting #4/1
09:07:14 Starting #1/2
09:07:14 Starting #2/2
09:07:15 Starting #3/2
09:07:15 Starting #4/2
09:07:16 Starting #5/1
09:07:16 Starting #6/1
09:07:17 Starting #7/1
09:07:17 Starting #8/1
09:07:18 Starting #5/2
09:07:18 Starting #6/2
09:07:19 Starting #7/2
09:07:19 Starting #8/2
09:07:20 Starting #9/1
09:07:20 Starting #10/1
09:07:22 Starting #9/2
09:07:22 Starting #10/2
For example at the 09:07:14 mark the 1-second wait period of the failed item #1 has been expired, so its second attempt should be prioritized over doing the first attempt of the item #5.
An unsuccessful attempt to solve this problem is to reverse the order of the two policies. Unfortunately putting the BulkheadPolicy before the WaitAndRetryPolicy results to reduced parallelization. What happens is that the BulkheadPolicy considers all retries of an item to be a singe operation, and so the "wait" phase between two retries counts towards the parallelization limit. Obviously I don't want that. The documentation also makes it clear the the order of the two policies in my example is correct:
BulkheadPolicy: Usually innermost unless wraps a final TimeoutPolicy. Certainly inside any WaitAndRetry. The Bulkhead intentionally limits the parallelization. You want that parallelization devoted to running the delegate, not occupied by waits for a retry.
Is there any way to achieve the behavior I want, while staying in the realm of the Polly library?
I found a simple but not perfect solution to this problem. The solution is to include a second BulkheadPolicy positioned before the WaitAndRetryPolicy (in an "outer" position). This extra Bulkhead will serve only for reprioritizing the workload (by serving as an outer queue), and should have a substantially larger capacity (x10 or more) than the inner Bulkhead that controls the parallelization. The reason is that the outer Bulkhead could also affect (reduce) the parallelization in an unpredictable way, and we don't want that. This is why I consider this solution imperfect, because neither the prioritization is optimal, nor it is guaranteed that the parallelization will not be affected.
Here is the combined policy of the original example, enhanced with an outer BulkheadPolicy. Its capacity is only 2.5 times larger, which is suitable for this contrived example, but too small for the general case:
var policy = Policy.WrapAsync
(
Policy.BulkheadAsync( // For improving prioritization
maxParallelization: 5, maxQueuingActions: Int32.MaxValue),
Policy
.Handle<HttpRequestException>()
.WaitAndRetryAsync(retryCount: 1, _ => TimeSpan.FromSeconds(1)),
Policy.BulkheadAsync( // For controlling paralellization
maxParallelization: 2, maxQueuingActions: Int32.MaxValue)
);
And here is the output of the execution:
12:36:02 Starting #1/1
12:36:02 Starting #2/1
12:36:03 Starting #3/1
12:36:03 Starting #4/1
12:36:04 Starting #2/2
12:36:04 Starting #5/1
12:36:05 Starting #1/2
12:36:05 Starting #3/2
12:36:06 Starting #6/1
12:36:06 Starting #4/2
12:36:07 Starting #8/1
12:36:07 Starting #5/2
12:36:08 Starting #9/1
12:36:08 Starting #7/1
12:36:09 Starting #10/1
12:36:09 Starting #6/2
12:36:10 Starting #7/2
12:36:10 Starting #8/2
12:36:11 Starting #9/2
12:36:11 Starting #10/2
Although this solution is not perfect, I believe that it should do more good than harm in the general case, and should result in a better performance overall.
In a LINQ Query, I have used .AsParallel as follows:
var completeReservationItems = from rBase in reservation.AsParallel()
join rRel in relationship.AsParallel() on rBase.GroupCode equals rRel.SourceGroupCode
join rTarget in reservation.AsParallel() on rRel.TargetCode equals rTarget.GroupCode
where rRel.ProgramCode == programCode && rBase.StartDate <= rTarget.StartDate && rBase.EndDate >= rTarget.EndDate
select new Object
{
//Initialize based on the query
};
Then, I have created two separate Tasks and was running them in parallel, passing the same Lists to both the methods as follows:
Task getS1Status = Task.Factory.StartNew(
() =>
{
RunLinqQuery(params);
});
Task getS2Status = Task.Factory.StartNew(
() =>
{
RunLinqQuery(params);
});
Task.WaitAll(getS1Status, getS2Status);
I was capturing the timings and was surprised to see that the timings were as follows:
Above scenario: 6 sec (6000 ms)
Same code, running sequentially instead of 2 Tasks: 50 ms
Same code, but without .AsParallel() in the LINQ: 50 ms
I wanted to understand why this is taking so long in the above scenario.
Posting this as answer only because I have some code to show.
Firstly, I dont know how many threads will be created with AsParallel(). Documentation dont say anything about it https://msdn.microsoft.com/en-us/library/dd413237(v=vs.110).aspx
Imagine following code
void RunMe()
{
foreach (var threadId in Enumerable.Range(0, 100)
.AsParallel()
.Select(x => Thread.CurrentThread.ManagedThreadId)
.Distinct())
Console.WriteLine(threadId);
}
How much thread's ids we will see? For me each time will see different number of threads, example output:
30 // only one thread!
Next time
27 // several threads
13
38
10
43
30
I think, number of threads depends of current scheduler. We can always define maximum number of threads by calling WithDegreeOfParallelism (https://msdn.microsoft.com/en-us/library/dd383719(v=vs.110).aspx) method, example
void RunMe()
{
foreach (var threadId in Enumerable.Range(0, 100)
.AsParallel()
.WithDegreeOfParallelism(2)
.Select(x => Thread.CurrentThread.ManagedThreadId)
.Distinct())
Console.WriteLine(threadId);
}
Now, output will contains maximum 2 threads.
7
40
Why this important? As I said, number of threads can directly influence on performance.
But, this is not all problems. In your 1 scenario, you are creating new tasks (which will perform inside thread pool and can add additional overhead), and then, you are calling Task.WaitAll. Take a look on source code for it https://referencesource.microsoft.com/#mscorlib/system/threading/Tasks/Task.cs,72b6b3fa5eb35695 , Im sure that those for loop by task will add additional overhead, and, in situation when AsParallel will take too much threads inside first task, next task can start continiously. Moreover, this CAN be happen, so, if you will run your 1 scenario 1000 times, probably, you will get very different results.
So, my last argument that you try to measure parallel code, but it is very hard to do it right. Im not recommend to use parallel stuff as much as you can, because it can raise performance degradation, if you dont know exactly, what are you doing.
I use 2 Parallel.ForEach nested loops to retrieve information quickly from a url. This is the code:
while (searches.Count > 0)
{
Parallel.ForEach(searches, (search, loopState) =>
{
Parallel.ForEach(search.items, item =>
{
RetrieveInfo(item);
}
);
}
);
}
The outer ForEach has a list of, for example 10, whilst the inner ForEach has a list of 5. This means that I'm going to query the url 50 times, however I query it 5 times simultaneously (inner ForEach).
I need to add a delay for the inner loop so that after it queries the url, it waits for x seconds - the time taken for the inner loop to complete the 5 requests.
Using Thread.Sleep is not a good idea because it will block the complete thread and possibly the other parallel tasks.
Is there an alternative that might work?
To my understanding, you have 50 tasks and you wish to process 5 of them at a time.
If so, you should look into ParallelOptions.MaxDegreeOfParallelism to process 50 tasks with a maximum degree of parallelism at 5. When one task stops, another task is permitted to start.
If you wish to have tasks processed in chunks of five, followed by another chunk of five (as in, you wish to process chunks in serial), then you would want code similar to
for(...)
{
Parallel.ForEach(
[paralleloptions,]
set of 5, action
)
}
I'm investigating the Parallelism Break in a For loop.
After reading this and this I still have a question:
I'd expect this code :
Parallel.For(0, 10, (i,state) =>
{
Console.WriteLine(i); if (i == 5) state.Break();
}
To yield at most 6 numbers (0..6).
not only he is not doing it but have different result length :
02351486
013542
0135642
Very annoying. (where the hell is Break() {after 5} here ??)
So I looked at msdn
Break may be used to communicate to the loop that no other iterations after the current iteration need be run.
If Break is called from the 100th iteration of a for loop iterating in
parallel from 0 to 1000, all iterations less than 100 should still be
run, but the iterations from 101 through to 1000 are not necessary.
Quesion #1 :
Which iterations ? the overall iteration counter ? or per thread ? I'm pretty sure it is per thread. please approve.
Question #2 :
Lets assume we are using Parallel + range partition (due to no cpu cost change between elements) so it divides the data among threads . So if we have 4 cores (and perfect divisions among them):
core #1 got 0..250
core #2 got 251..500
core #3 got 501..750
core #4 got 751..1000
so the thread in core #1 will meet value=100 sometime and will break.
this will be his iteration number 100 .
But the thread in core #4 got more quanta and he is on 900 now. he is way beyond his 100'th iteration.
He doesnt have index less 100 to be stopped !! - so he will show them all.
Am I right ? is that is the reason why I get more than 5 elements in my example ?
Question #3 :
How cn I truly break when (i == 5) ?
p.s.
I mean , come on ! when I do Break() , I want things the loop to stop.
excactly as I do in regular For loop.
To yield at most 6 numbers (0..6).
The problem is that this won't yield at most 6 numbers.
What happens is, when you hit a loop with an index of 5, you send the "break" request. Break() will cause the loop to no longer process any values >5, but process all values <5.
However, any values greater than 5 which were already started will still get processed. Since the various indices are running in parallel, they're no longer ordered, so you get various runs where some values >5 (such as 8 in your example) are still being executed.
Which iterations ? the overall iteration counter ? or per thread ? I'm pretty sure it is per thread. please approve.
This is the index being passed into Parallel.For. Break() won't prevent items from being processed, but provides a guarantee that all items up to 100 get processed, but items above 100 may or may not get processed.
Am I right ? is that is the reason why I get more than 5 elements in my example ?
Yes. If you use a partitioner like you've shown, as soon as you call Break(), items beyond the one where you break will no longer get scheduled. However, items (which is the entire partition) already scheduled will get processed fully. In your example, this means you're likely to always process all 1000 items.
How can I truly break when (i == 5) ?
You are - but when you run in Parallel, things change. What is the actual goal here? If you only want to process the first 6 items (0-5), you should restrict the items before you loop through them via a LINQ query or similar. You can then process the 6 items in Parallel.For or Parallel.ForEach without a Break() and without worry.
I mean , come on ! when I do Break() , I want things the loop to stop. excactly as I do in regular For loop.
You should use Stop() instead of Break() if you want things to stop as quickly as possible. This will not prevent items already running from stopping, but will no longer schedule any items (including ones at lower indices or earlier in the enumeration than your current position).
If Break is called from the 100th iteration of a for loop iterating in parallel from 0 to 1000
The 100th iteration of the loop is not necessarily (in fact probably not) the one with the index 99.
Your threads can and will run in an indeterminent order. When the .Break() instruction is encountered, no further loop iterations will be started. Exactly when that happens depends on the specifics of thread scheduling for a particular run.
I strongly recommend reading
Patterns of Parallel Programming
(free PDF from Microsoft)
to understand the design decisions and design tradeoffs that went into the TPL.
Which iterations ? the overall iteration counter ? or per thread ?
Off all the iterations scheduled (or yet to be scheduled).
Remember the delegate may be run out of order, there is no guarantee that iteration i == 5 will be the sixth to execute, rather this is unlikely to be the case except in rare cases.
Q2: Am I right ?
No, the scheduling is not so simplistic. Rather all the tasks are queued up and then the queue is processed. But the threads each use their own queue until it is empty when they steal from other the threads. This leads no way to predict which thread will process what delegate.
If the delegates are sufficiently trivial it might all be processed on the original calling thread (no other thread gets a chance to steal work).
Q3: How cn I truly break when (i == 5) ?
Don't use concurrently if you want linear (in specific) processing.
The Break method is there to support speculative execution: try various ways and stop as soon as any one completes.