I have a problem connecting BroadcastBlock(s) to BatchBlocks. The scenario is that the sources are BroadcastBlocks, and recipients are BatchBlocks.
In the simplified code below, only one of the supplemental action blocks executes. I even set the batchSize for each BatchBlock to 1 to illustrate the problem.
Setting Greedy to "true" would make the 2 ActionBlocks execute, but that's not what I want as it will cause the BatchBlock to proceed even if it's not complete yet. Any ideas?
class Program
{
static void Main(string[] args)
{
// My possible sources are BroadcastBlocks. Could be more
var source1 = new BroadcastBlock<int>(z => z);
// batch 1
// can be many potential sources, one for now
// I want all sources to arrive first before proceeding
var batch1 = new BatchBlock<int>(1, new GroupingDataflowBlockOptions() { Greedy = false });
var batch1Action = new ActionBlock<int[]>(arr =>
{
// this does not run sometimes
Console.WriteLine("Received from batch 1 block!");
foreach (var item in arr)
{
Console.WriteLine("Received {0}", item);
}
});
batch1.LinkTo(batch1Action, new DataflowLinkOptions() { PropagateCompletion = true });
// batch 2
// can be many potential sources, one for now
// I want all sources to arrive first before proceeding
var batch2 = new BatchBlock<int>(1, new GroupingDataflowBlockOptions() { Greedy = false });
var batch2Action = new ActionBlock<int[]>(arr =>
{
// this does not run sometimes
Console.WriteLine("Received from batch 2 block!");
foreach (var item in arr)
{
Console.WriteLine("Received {0}", item);
}
});
batch2.LinkTo(batch2Action, new DataflowLinkOptions() { PropagateCompletion = true });
// connect source(s)
source1.LinkTo(batch1, new DataflowLinkOptions() { PropagateCompletion = true });
source1.LinkTo(batch2, new DataflowLinkOptions() { PropagateCompletion = true });
// fire
source1.SendAsync(3);
Task.WaitAll(new Task[] { batch1Action.Completion, batch2Action.Completion }); ;
Console.ReadLine();
}
}
It seems that there is a flaw in the internal mechanism of the TPL Dataflow library that supports the non-greedy functionality. What happens is that a BatchBlock configured as non-greedy will Postpone all offered messages from linked blocks, instead of accepting them. It keeps an internal queue with the messages it has postponed, and when their number reaches its BatchSize configuration it attempts to consume the postponed messages, and if successful it propagates them downstream as expected. The problem is that the source blocks like the BroadcastBlock and the BufferBlock will stop offering more messages to a block that has postponed a previously offered message, until it has consumed this single message. The combination of these two behaviors results to a deadlock. No progress forward can be made, because the BatchBlock waits for more messages to be offered before consuming the postponed, and the BroadcastBlock waits for the postponed messages to be consumed before offering more messages...
This scenario occurs only with BatchSize larger than one (which is the typical configuration for this block).
Here is a demonstration of this problem. As source it is used the more common BufferBlock instead of a BroadcastBlock. 10 messages are posted to a three-block pipeline, and the expected behavior is for the messages to flow through the pipeline to the last block. In reality nothing happens, and all messages remain stuck in the first block.
using System;
using System.Threading;
using System.Threading.Tasks.Dataflow;
public static class Program
{
static void Main(string[] args)
{
var bufferBlock = new BufferBlock<int>();
var batchBlock = new BatchBlock<int>(batchSize: 2,
new GroupingDataflowBlockOptions() { Greedy = false });
var actionBlock = new ActionBlock<int[]>(batch =>
Console.WriteLine($"Received: {String.Join(", ", batch)}"));
bufferBlock.LinkTo(batchBlock,
new DataflowLinkOptions() { PropagateCompletion = true });
batchBlock.LinkTo(actionBlock,
new DataflowLinkOptions() { PropagateCompletion = true });
for (int i = 1; i <= 10; i++)
{
var accepted = bufferBlock.Post(i);
Console.WriteLine(
$"bufferBlock.Post({i}) {(accepted ? "accepted" : "rejected")}");
Thread.Sleep(100);
}
bufferBlock.Complete();
actionBlock.Completion.Wait(millisecondsTimeout: 1000);
Console.WriteLine();
Console.WriteLine($"bufferBlock.Completion: {bufferBlock.Completion.Status}");
Console.WriteLine($"batchBlock.Completion: {batchBlock.Completion.Status}");
Console.WriteLine($"actionBlock.Completion: {actionBlock.Completion.Status}");
Console.WriteLine($"bufferBlock.Count: {bufferBlock.Count}");
}
}
Output:
bufferBlock.Post(1) accepted
bufferBlock.Post(2) accepted
bufferBlock.Post(3) accepted
bufferBlock.Post(4) accepted
bufferBlock.Post(5) accepted
bufferBlock.Post(6) accepted
bufferBlock.Post(7) accepted
bufferBlock.Post(8) accepted
bufferBlock.Post(9) accepted
bufferBlock.Post(10) accepted
bufferBlock.Completion: WaitingForActivation
batchBlock.Completion: WaitingForActivation
actionBlock.Completion: WaitingForActivation
bufferBlock.Count: 10
My guess is that the internal offer-consume-reserve-release mechanism has been tuned for maximum efficiency at supporting the BoundedCapacity functionality, which is critical for many applications, and the rarely used Greedy = false functionality was left non-thoroughly tested.
The good news is that in your case you don't really need to set the Greedy to false. A BatchBlock in the default greedy mode will not propagate less messages than the configured BatchSize, unless it has been marked as completed and it propagates any leftover messages, or you manually call its TriggerBatch method at any arbitrary moment. The intended usage of the non-greedy configuration is for preventing resource starvation in complex graph scenarios, with multiple dependencies between blocks.
Related
Consider the following pipeline:
public static void Main()
{
var firstBlock = new TransformBlock<string, string>(s =>
{
Console.WriteLine("FirstBlock");
return s + "-FirstBlock";
});
var batchBlock = new BatchBlock<string>(100, new GroupingDataflowBlockOptions { Greedy = true });
var afterBatchBlock = new TransformManyBlock<string[], string>(strings =>
{
Console.WriteLine("AfterBatchBlock");
return new[] { strings[0] + "-AfterBatchBlock" };
});
var lastBlock = new ActionBlock<string>(s =>
{
Console.WriteLine($"LastBlock {s}");
});
firstBlock.LinkTo(batchBlock, new DataflowLinkOptions { PropagateCompletion = true }, x => x.Contains("0"));
batchBlock.LinkTo(afterBatchBlock, new DataflowLinkOptions { PropagateCompletion = true });
afterBatchBlock.LinkTo(lastBlock, new DataflowLinkOptions { PropagateCompletion = true });
firstBlock.LinkTo(lastBlock, new DataflowLinkOptions { PropagateCompletion = true });
firstBlock.Post("0");
firstBlock.Complete();
firstBlock.Completion.GetAwaiter().GetResult();
batchBlock.Completion.GetAwaiter().GetResult();
afterBatchBlock.Completion.GetAwaiter().GetResult();
lastBlock.Completion.GetAwaiter().GetResult();
}
Running this code gets blocked with the following output:
FirstBlock
AfterBatchBlock
What I think is happening behind the scenes is the following:
FirstBlock sends a Completion signal to all its linked targets (BatchBlock, AfterBatchBlock, LastBlock)
BatchBlock and LastBlock gets completed
AfterBatchBlock then tries to pass its output to LastBlock but it is already completed and it gets stuck forever.
My question is:
Is it a bug?
Assuming it is by design, what is the recommended way to overcome the bad state my pipeline reached
No, it's not a bug. This is the expected behavior. You must first realize that you are not dealing with a simple, straightforward dataflow pipeline, but rather with a dataflow mesh. It's not a complex mesh, but still not what I would describe as pipeline. Dealing with the completion of dataflow blocks that form a mesh can be tricky. In your case you have one block, the lastBlock, which is the target of two source blocks, the firstBlock and afterBatchBlock. This block should be completed when both source blocks have completed, not either one them. So you can't use the PropagateCompletion = true option when linking this block with its sources.
There is no built-in API to do this, but implementing a two-to-one completion propagator is not very difficult. In fact you could just copy-paste the PropagateCompletion method found in this answer, and use it like this:
afterBatchBlock.LinkTo(lastBlock);
firstBlock.LinkTo(lastBlock);
PropagateCompletion(new IDataflowBlock[] { afterBatchBlock, firstBlock },
new[] { lastBlock });
Typical situation: Fast producer, slow consumer, need to slow producer down.
Sample code that doesn't work as I expected (explained below):
// I assumed this block will behave like BlockingCollection, but it doesn't
var bb = new BufferBlock<int>(new DataflowBlockOptions {
BoundedCapacity = 3, // looks like this does nothing
});
// fast producer
int dataSource = -1;
var producer = Task.Run(() => {
while (dataSource < 10) {
var message = ++dataSource;
bb.Post(message);
Console.WriteLine($"Posted: {message}");
}
Console.WriteLine("Calling .Complete() on buffer block");
bb.Complete();
});
// slow consumer
var ab = new ActionBlock<int>(i => {
Thread.Sleep(500);
Console.WriteLine($"Received: {i}");
}, new ExecutionDataflowBlockOptions {
MaxDegreeOfParallelism = 2,
});
bb.LinkTo(ab);
ab.Completion.Wait();
How I thought this code would work, but it doesn't:
BufferBlock bb is the blocking queue with capacity of 3. Once capacity reached, producer should not be able to .Post() to it, until there's a vacant slot.
Doesn't work like that. bb seems to happily accept any number of messages.
producer is a task that quickly Posts messages. Once all messages have been posted, the call to bb.Complete() should propagate through the pipeline and signal shutdown once all messages have been processed. Hence waiting ab.Completion.Wait(); at the end.
Doesn't work either. As soon as .Complete() is called, action block ab won't receive any more messages.
Can be done with a BlockingCollection, which I thought in TPL Dataflow (TDF) world BufferBlock was the equivalent of. I guess I'm misunderstanding how backpressure is supposed to work in TPL Dataflow.
So where's the catch? How to run this pipeline, not allowing more than 3 messages in the buffer bb, and wait for its completion?
PS: I found this gist (https://gist.github.com/mnadel/df2ec09fe7eae9ba8938) where it's suggested to maintain a semaphore to block writing to BufferBlock. I thought this was "built-in".
Update after accepting an answer:
Updates after accepting the answer:
If you're looking at this question, you need to remember that ActionBlock also has its own input buffer.
That's for one. Then you also need to realize, that because all blocks have their own input buffers you don't need the BufferBlock for what you might think its name implied. A BufferBlock is more like a utility block for more complex architectures or like a balance loading block. But it's not a backpressure buffer.
Completion propagation needs to be dfined at link level explicitly.
When calling .LinkTo() need to explicitly pass new DataflowLinkOptions {PropagateCompletion = true} as the 2nd argument.
To introduce back pressure you need use SendAsync when you send items into the block. This allows your producer to wait for the block to be ready for the item. Something like this is what you're looking for:
class Program
{
static async Task Main()
{
var options = new ExecutionDataflowBlockOptions()
{
BoundedCapacity = 3
};
var block = new ActionBlock<int>(async i =>
{
await Task.Delay(100);
Console.WriteLine(i);
}, options);
//Producer
foreach (var i in Enumerable.Range(0, 10))
{
await block.SendAsync(i);
}
block.Complete();
await block.Completion;
}
}
If you change this to use Post and print the result of the Post you'll see that many items fail to be passed to the block:
class Program
{
static async Task Main()
{
var options = new ExecutionDataflowBlockOptions()
{
BoundedCapacity = 1
};
var block = new ActionBlock<int>(async i =>
{
await Task.Delay(1000);
Console.WriteLine(i);
}, options);
//Producer
foreach (var i in Enumerable.Range(0, 10))
{
var result = block.Post(i);
Console.WriteLine(result);
}
block.Complete();
await block.Completion;
}
}
Output:
True
False
False
False
False
False
False
False
False
False
0
With the guidance from JSteward's answer, I came up with the following code.
It produces (reads etc.) new items concurrently with processing said items, maintaining a read-ahead buffer.
The completion signal is sent to the head of the chain when the "producer" has no more items.
The program also awaits the completion of the whole chain before terminating.
static async Task Main() {
string Time() => $"{DateTime.Now:hh:mm:ss.fff}";
// the buffer is added to the chain just for demonstration purposes
// the chain would work fine using just the built-in input buffer
// of the `action` block.
var buffer = new BufferBlock<int>(new DataflowBlockOptions {BoundedCapacity = 3});
var action = new ActionBlock<int>(async i =>
{
Console.WriteLine($"[{Time()}]: Processing: {i}");
await Task.Delay(500);
}, new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = 2, BoundedCapacity = 2});
// it's necessary to set `PropagateCompletion` property
buffer.LinkTo(action, new DataflowLinkOptions {PropagateCompletion = true});
//Producer
foreach (var i in Enumerable.Range(0, 10))
{
Console.WriteLine($"[{Time()}]: Ready to send: {i}");
await buffer.SendAsync(i);
Console.WriteLine($"[{Time()}]: Sent: {i}");
}
// we call `.Complete()` on the head of the chain and it's propagated forward
buffer.Complete();
await action.Completion;
}
I'm trying to do a parallel SqlBulkCopy to multiple targets over WAN, many of which may be having slow connections and/or connection cutoffs; their connection speed varies from 2 to 50 mbits download, and I am sending from a connection with 1000 mbit upload; a lot of the targets need multiple retries to correctly finish.
I'm currently using a Parallel.ForEach on the GetConsumingEnumerable() of a BlockingCollection (queue); however I either stumbled upon some bug, or I am having problems fully understanding its purpose, or simply got something wrong..
The code never calls the CompleteAdding() method of the blockingcollection,
it seems that somewhere in the parallel-foreach-loop some of the targets get lost.
Even if there are different approaches to this, and disregarding the kind of work it is doing in the loop, the blockingcollection shouldn't behave the way it does in this example, should it?
In the foreach-loop, I do the work, and add the target to a results-collection in case it completed successfully, or re-add the target to the BlockingCollection in case of an error until the target reached the max retries threshold; at that point I add it to the results-collection.
In an additional Task, I loop until the count of the results-collection equals the initial count of the targets; then I do the CompleteAdding() on the blocking collection.
I already tried using a locking object for the operations on the results-collection (using a List<int> instead) and the queue, with no luck, but that shouldn't be necessary anyways. I also tried adding the retries to a separate collection, and re-adding those to the BlockingCollection in a different Task instead of in the parallel.foreach.
Just for fun I also tried compiling with .NET from 4.5 to 4.8, and different C# language versions.
Here is a simplified example:
List<int> targets = new List<int>();
for (int i = 0; i < 200; i++)
{
targets.Add(0);
}
BlockingCollection<int> queue = new BlockingCollection<int>(new ConcurrentQueue<int>());
ConcurrentBag<int> results = new ConcurrentBag<int>();
targets.ForEach(f => queue.Add(f));
// Bulkcopy in die Filialen:
Task.Run(() =>
{
while (results.Count < targets.Count)
{
Thread.Sleep(2000);
Console.WriteLine($"Completed: {results.Count} / {targets.Count} | queue: {queue.Count}");
}
queue.CompleteAdding();
});
int MAX_RETRIES = 10;
ParallelOptions options = new ParallelOptions { MaxDegreeOfParallelism = 50 };
Parallel.ForEach(queue.GetConsumingEnumerable(), options, target =>
{
try
{
// simulate a problem with the bulkcopy:
throw new Exception();
results.Add(target);
}
catch (Exception)
{
if (target < MAX_RETRIES)
{
target++;
if (!queue.TryAdd(target))
Console.WriteLine($"{target.ToString("D3")}: Error, can't add to queue!");
}
else
{
results.Add(target);
Console.WriteLine($"Aborted after {target + 1} tries | {results.Count} / {targets.Count} items finished.");
}
}
});
I expected the count of the results-collection to be the exact count of the targets-list in the end, but it seems to never reach that number, which results in the BlockingCollection never being marked as completed, so the code never finishes.
I really don't understand why not all of the targets get added to the results-collection eventually! The added count always varies, and is mostly just shy of the expected final count.
EDIT: I removed the retry-part, and replaced the ConcurrentBag with a simple int-counter, and it still doesn't work most of the time:
List<int> targets = new List<int>();
for (int i = 0; i < 500; i++)
targets.Add(0);
BlockingCollection<int> queue = new BlockingCollection<int>(new ConcurrentQueue<int>());
//ConcurrentBag<int> results = new ConcurrentBag<int>();
int completed = 0;
targets.ForEach(f => queue.Add(f));
var thread = new Thread(() =>
{
while (completed < targets.Count)
{
Thread.Sleep(2000);
Console.WriteLine($"Completed: {completed} / {targets.Count} | queue: {queue.Count}");
}
queue.CompleteAdding();
});
thread.Start();
ParallelOptions options = new ParallelOptions { MaxDegreeOfParallelism = 4 };
Parallel.ForEach(queue.GetConsumingEnumerable(), options, target =>
{
Interlocked.Increment(ref completed);
});
Sorry, found the answer: the default partitioner used by blockingcollection and parallel foreach is chunking and buffering, which results in the foreach loop to forever wait for enough items for the next chunk.. for me, it sat there for a whole night, without processing the last few items!
So, instead of:
ParallelOptions options = new ParallelOptions { MaxDegreeOfParallelism = 4 };
Parallel.ForEach(queue.GetConsumingEnumerable(), options, target =>
{
Interlocked.Increment(ref completed);
});
you have to use:
var partitioner = Partitioner.Create(queue.GetConsumingEnumerable(), EnumerablePartitionerOptions.NoBuffering);
ParallelOptions options = new ParallelOptions { MaxDegreeOfParallelism = 4 };
Parallel.ForEach(partitioner, options, target =>
{
Interlocked.Increment(ref completed);
});
Parallel.ForEach is meant for data parallelism (ie processing 100K rows using all 8 cores), not concurrent operations. This is essentially a pub/sub and async problem, if not a pipeline problem. There's nothing for the CPU to do in this case, just start the async operations and wait for them to complete.
.NET handles this since .NET 4.5 through the Dataflow classes and lately, the lower-level System.Threading.Channel namespace.
In its simplest form, you can create an ActionBlock<> that takes a buffer and target connection and publishes the data. Let's say you use this method to send the data to a server :
async Task MyBulkCopyMethod(string connectionString,DataTable data)
{
using(var bcp=new SqlBulkCopy(connectionString))
{
//Set up mappings etc.
//....
await bcp.WriteToServerAsync(data);
}
}
You can use this with an ActionBlock class with a configured degree of parallelism. Dataflow classes like ActionBlock have their own input, and where appropriate, output buffers, so there's no need to create a separate queue :
class DataMessage
{
public string Connection{get;set;}
public DataTable Data {get;set;}
}
...
var options=new ExecutionDataflowBlockOptions {
MaxDegreeOfParallelism = 50,
BoundedCapacity = 8
};
var block=new ActionBlock<DataMessage>(msg=>MyBulkCopyMethod(msg.Connection,msg.Data, options);
We can start posting messages to the block now. By setting the capacity to 8 we ensure the input buffer won't get filled with large messages if the block is too slow. MaxDegreeOfParallelism controls how may operations run concurrently. Let's say we want to send the same data to many servers :
var data=.....;
var servers=new[]{connString1, connString2,....};
var messages= from sv in servers
select new DataMessage{ ConnectionString=sv,Table=data};
foreach(var msg in messages)
{
await block.SendAsync(msg);
}
//Tell the block we are done
block.Complete();
//Await for all messages to finish processing
await block.Completion;
Retries
One possibility for retries is to use a retry loop in the worker function. A better idea would be to use a different block and post failed messages there.
var block=new ActionBlock<DataMessage>(async msg=> {
try {
await MyBulkCopyMethod(msg.Connection,msg.Data, options);
}
catch(SqlException exc) when (some retry condition)
{
//Post without awaiting
retryBlock.Post(msg);
});
When the original block completes we want to tell the retry block to complete as well, no matter what :
block.Completion.ContinueWith(_=>retryBlock.Complete());
Now we can await the retryBlock to complete.
That block could have a smaller DOP and perhaps a delay between attempts :
var retryOptions=new ExecutionDataflowBlockOptions {
MaxDegreeOfParallelism = 5
};
var retryBlock=new ActionBlock<DataMessage>(async msg=>{
await Task.Delay(1000);
try {
await MyBulkCopyMethod(msg.Connection,msg.Data, options);
}
catch (Exception ....)
{
...
}
});
This pattern can be repeated to create multiple levels of retry, or different conditions. It can also be used to create different priority workers by giving a larger DOP to high priority workers, or a larger delay to low priority workers
I'm implementing simple data loader over HTTP, following tips from my previous question C# .NET Parallel I/O operation (with throttling), answered by Throttling asynchronous tasks.
I split loading and deserialization, assuming that one may be slower/faster than other. Also I want to throttle downloading, but don't want to throttle deserialization. Therefore I'm using two blocks and one buffer.
Unfortunately I'm facing problem that this pipeline sometimes processes less messages than consumed (I know from target server that I did exactly n requests, but I end up with less responses).
My method looks like this (no error handling):
public async Task<IEnumerable<DummyData>> LoadAsync(IEnumerable<Uri> uris)
{
IList<DummyData> result;
using (var client = new HttpClient())
{
var buffer = new BufferBlock<DummyData>();
var downloader = new TransformBlock<Uri, string>(
async u => await client.GetStringAsync(u),
new ExecutionDataflowBlockOptions
{ MaxDegreeOfParallelism = _maxParallelism });
var deserializer =
new TransformBlock<string, DummyData>(
s => JsonConvert.DeserializeObject<DummyData>(s),
new ExecutionDataflowBlockOptions
{ MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });
var linkOptions = new DataflowLinkOptions { PropagateCompletion = true };
downloader.LinkTo(deserializer, linkOptions);
deserializer.LinkTo(buffer, linkOptions);
foreach (Uri uri in uris)
{
await downloader.SendAsync(uri);
}
downloader.Complete();
await downloader.Completion;
buffer.TryReceiveAll(out result);
}
return result;
}
So to be more specific, I have 100 URLs to load, but I get 90-99 responses. No error & server handled 100 requests. This happens randomly, most of the time code behaves correctly.
There are three issues with your code:
Awaiting for the completion of the first block of the pipeline (downloader) instead of the last (buffer).
Using the TryReceiveAll method for retrieving the messages of the buffer block. The correct way to retrieve all messages from an unlinked block without introducing race conditions is to use the methods OutputAvailableAsync and TryReceive in a nested loop. You can find examples here and here.
In case of timeout the HttpClient throws an unexpected TaskCanceledException, and the TPL Dataflow blocks happens to ignore exceptions of this type. The combination of these two unfortunate realities means that by default any timeout occurrence will remain unobserved. To fix this problem you could change your code like this:
var downloader = new TransformBlock<Uri, string>(async url =>
{
try
{
return await client.GetStringAsync(url);
}
catch (OperationCanceledException) { throw new TimeoutException(); }
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = _maxParallelism });
A fourth unrelated issue is the use of the MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded option for the deserializer block. In the (hopefully unlikely) case that the deserializer is slower than the downloader, the deserializer will start queuing more and more work on the ThreadPool, keeping it permanently starved. This will not be good for the performance and the responsiveness of your application, or for the health of the system as a whole. In practice there is rarely a reason to configure a CPU-bound block with a MaxDegreeOfParallelism larger than Environment.ProcessorCount.
At work one of our processes uses a SQL database table as a queue. I've been designing a queue reader to check the table for queued work, update the row status when work starts, and delete the row when the work is finished. I'm using Parallel.Foreach to give each process its own thread and setting MaxDegreeOfParallelism to 4.
When the queue reader starts up it checks for any unfinished work and loads the work into an list, then it does a Concat with that list and a method that returns an IEnumerable which runs in an infinite loop checking for new work to do. The idea is that the unfinished work should be processed first and then the new work can be worked as threads are available. However what I'm seeing is that FetchQueuedWork will change dozens of rows in the queue table to 'Processing' immediately but only work on a few items at a time.
What I expected to happen was that FetchQueuedWork would only get new work and update the table when a slot opened up in the Parallel.Foreach. What's really odd to me is that it behaves exactly as I would expect when I run the code in my local developer environment, but in production I get the above problem.
I'm using .Net 4. Here is the code:
public void Go()
{
List<WorkData> unfinishedWork = WorkData.LoadUnfinishedWork();
IEnumerable<WorkData> work = unfinishedWork.Concat(FetchQueuedWork());
Parallel.ForEach(work, new ParallelOptions { MaxDegreeOfParallelism = 4 }, DoWork);
}
private IEnumerable<WorkData> FetchQueuedWork()
{
while (true)
{
var workUnit = WorkData.GetQueuedWorkAndSetStatusToProcessing();
yield return workUnit;
}
}
private void DoWork(WorkData workUnit)
{
if (!workUnit.Loaded)
{
System.Threading.Thread.Sleep(5000);
return;
}
Work();
}
I suspect that the default (Release mode?) behaviour is to buffer the input. You might need to create your own partitioner and pass it the NoBuffering option:
List<WorkData> unfinishedWork = WorkData.LoadUnfinishedWork();
IEnumerable<WorkData> work = unfinishedWork.Concat(FetchQueuedWork());
var options = new ParallelOptions { MaxDegreeOfParallelism = 4 };
var partitioner = Partitioner.Create(work, EnumerablePartitionerOptions.NoBuffering);
Parallel.ForEach(partioner, options, DoWork);
Blorgbeard's solution is correct when it comes to .NET 4.5 - hands down.
If you are constrained to .NET 4, you have a few options:
Replace your Parallel.ForEach with work.AsParallel().WithDegreeOfParallelism(4).ForAll(DoWork). PLINQ is more conservative when it comes to buffering items, so this should do the trick.
Write your own enumerable partitioner (good luck).
Create a grotty semaphore-based hack such as this:
(Side-effecting Select used for the sake of brevity)
public void Go()
{
const int MAX_DEGREE_PARALLELISM = 4;
using (var semaphore = new SemaphoreSlim(MAX_DEGREE_PARALLELISM, MAX_DEGREE_PARALLELISM))
{
List<WorkData> unfinishedWork = WorkData.LoadUnfinishedWork();
IEnumerable<WorkData> work = unfinishedWork
.Concat(FetchQueuedWork())
.Select(w =>
{
// Side-effect: bad practice, but easier
// than writing your own IEnumerable.
semaphore.Wait();
return w;
});
// You still need to specify MaxDegreeOfParallelism
// here so as not to saturate your thread pool when
// Parallel.ForEach's load balancer kicks in.
Parallel.ForEach(work, new ParallelOptions { MaxDegreeOfParallelism = MAX_DEGREE_PARALLELISM }, workUnit =>
{
try
{
this.DoWork(workUnit);
}
finally
{
semaphore.Release();
}
});
}
}