I'm implementing simple data loader over HTTP, following tips from my previous question C# .NET Parallel I/O operation (with throttling), answered by Throttling asynchronous tasks.
I split loading and deserialization, assuming that one may be slower/faster than other. Also I want to throttle downloading, but don't want to throttle deserialization. Therefore I'm using two blocks and one buffer.
Unfortunately I'm facing problem that this pipeline sometimes processes less messages than consumed (I know from target server that I did exactly n requests, but I end up with less responses).
My method looks like this (no error handling):
public async Task<IEnumerable<DummyData>> LoadAsync(IEnumerable<Uri> uris)
{
IList<DummyData> result;
using (var client = new HttpClient())
{
var buffer = new BufferBlock<DummyData>();
var downloader = new TransformBlock<Uri, string>(
async u => await client.GetStringAsync(u),
new ExecutionDataflowBlockOptions
{ MaxDegreeOfParallelism = _maxParallelism });
var deserializer =
new TransformBlock<string, DummyData>(
s => JsonConvert.DeserializeObject<DummyData>(s),
new ExecutionDataflowBlockOptions
{ MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });
var linkOptions = new DataflowLinkOptions { PropagateCompletion = true };
downloader.LinkTo(deserializer, linkOptions);
deserializer.LinkTo(buffer, linkOptions);
foreach (Uri uri in uris)
{
await downloader.SendAsync(uri);
}
downloader.Complete();
await downloader.Completion;
buffer.TryReceiveAll(out result);
}
return result;
}
So to be more specific, I have 100 URLs to load, but I get 90-99 responses. No error & server handled 100 requests. This happens randomly, most of the time code behaves correctly.
There are three issues with your code:
Awaiting for the completion of the first block of the pipeline (downloader) instead of the last (buffer).
Using the TryReceiveAll method for retrieving the messages of the buffer block. The correct way to retrieve all messages from an unlinked block without introducing race conditions is to use the methods OutputAvailableAsync and TryReceive in a nested loop. You can find examples here and here.
In case of timeout the HttpClient throws an unexpected TaskCanceledException, and the TPL Dataflow blocks happens to ignore exceptions of this type. The combination of these two unfortunate realities means that by default any timeout occurrence will remain unobserved. To fix this problem you could change your code like this:
var downloader = new TransformBlock<Uri, string>(async url =>
{
try
{
return await client.GetStringAsync(url);
}
catch (OperationCanceledException) { throw new TimeoutException(); }
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = _maxParallelism });
A fourth unrelated issue is the use of the MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded option for the deserializer block. In the (hopefully unlikely) case that the deserializer is slower than the downloader, the deserializer will start queuing more and more work on the ThreadPool, keeping it permanently starved. This will not be good for the performance and the responsiveness of your application, or for the health of the system as a whole. In practice there is rarely a reason to configure a CPU-bound block with a MaxDegreeOfParallelism larger than Environment.ProcessorCount.
Related
I have the following code that gets called from a Controller.
public async Task Execute()
{
var collections= await _repo.GetCollections(); // This gets 500+ items
List<Object1> coolCollections= new List<Object1>();
List<Object2> uncoolCollections= new List<Object2>();
foreach (var collection in collections)
{
if(collection == "Something")
{
var specialObject = TurnObjectIntoSpecialObject(collection);
uncoolCollections.Add(specialObject);
}
else
{
var anotherObject = TurnObjectIntoAnotherObject(object);
coolCollections.Add(anotherObject);
}
}
var list1Async = coolCollections.Select(async obj => await restService.PostObject1(obj)); //each call takes 200 -> 2000ms
var list2Async = uncoolCollections.Select(async obj => await restService.PostObject2(obj));//each call takes 300 -> 3000ms
var asyncTasks = list1Async.Concat<Task>(list2Async);
await Task.WhenAll(asyncTasks); //As 500+ 'tasks'
}
Unfortunately, I'm getting a 504 error after around 300 or so requests. I can't change the API the RestService calls so I'm stuck trying to make the above code more performant.
Changing Task.WhenAll to a foreach loop does work, and does resolve the time out but it's very slow.
My question is how can I make sure the above code does not timeout after x number of requests?
Making more concurrent calls to a remote site or database doesn't improve throughput, quite the opposite. Conflicts between the concurrent operations mean that beyond a certain point everything will start taking more time, until the service crashes. Right now you have a possible 300-way blocking problem.
The way all services handle this is by restricting the number of concurrent connections. In fact, many services will throttle clients so they don't crash if someone ... sends 500 concurrent requests. Some may even tell you they're throttling you with a 429 response.
Another way is to use batch requests, so instead of making 300 or 500 calls you can send a batch of 500 operations, allowing the service to handle them in an efficient way.
You can use Parallel.ForEachAsync to execute multiple calls with a specified degree of parallelism :
ParallelOptions parallelOptions = new()
{
MaxDegreeOfParallelism = 30
};
await Parallel.ForEachAsync(object1, parallelOptions, async obj =>
{
await restService.PostObject1(obj);
});
await Parallel.ForEachAsync(object2, parallelOptions, async obj =>
{
await restService.PostObject2(obj);
});
You can adjust the DOP to find what works best without slowing down the remote service.
If the service can handle it, you could start both pipelines concurrently and await both of them to complete:
ParallelOptions parallelOptions = new()
{
MaxDegreeOfParallelism = 30
};
var task1=Parallel.ForEachAsync(object1, parallelOptions, async obj =>
{
await restService.PostObject1(obj);
});
var task2=Parallel.ForEachAsync(object2, parallelOptions, async obj =>
{
await restService.PostObject2(obj);
});
await Task.WhenAll(task1,task2);
It doesn't make sense to retry those operations before you limit the DOP. Retrying 500 failed requests will only lead to another failure. Retrying with a random or staggered delay is essentially the same as limiting the DOP from the start, except it takes far longer to complete.
Since you are unable to change the rest service the 504 gateway timeout will remain.
A better solution would be to use a retry mechanism, if you receive a 504 error code then you'll retry after 'x' seconds.
Why retry after 'x' seconds?
The reasons for the 504 could be many, it could be that the server does not handle more request because it is at it's maximum workload at the moment.
A good and battle-tested library for retry mechanism is Polly.
You could also write your own function or action depending on the return type.
Depending on the happyflow use-case other methods could be used, but in this situation I went with the thought of you wanting to upload all the data even if an exception occurs.
Something to think about, if this service is provided by a third party vendor then look into the documentation. It will most likely have a section on max concurrent connections to the service.
This answer assumes that you are using .NET 6 or later. You could project the objects to an enumerable of object elements, and then parallelize the processing of the projected objects with the Parallel.ForEachAsync method. This method allows to configure the MaxDegreeOfParallelism, so that not all projected objects are processed at once. As a result the remote server will not be bombarded with more requests than it can handle, nor the network bandwidth will be saturated. The optimal MaxDegreeOfParallelism can be found by experimentation. Start with a small number, like 5, and then gradually increase it until you find the sweet spot that offers the best performance.
public async Task Execute()
{
var objects = await _repo.GetObjects();
IEnumerable<object> projected = objects.Select(obj =>
{
if (obj == "Something")
{
return (object)TurnObjectIntoSpecialObject(obj);
}
else
{
return (object)TurnObjectIntoAnotherObject(obj);
}
});
ParallelOptions options = new()
{
MaxDegreeOfParallelism = 5
};
await Parallel.ForEachAsync(projected, options, async (item, ct) =>
{
switch (item)
{
case Object1 obj1: await restService.PostObject1(obj1); break;
case Object2 obj2: await restService.PostObject2(obj2); break;
default: throw new NotImplementedException();
}
});
}
The above code processes the objects in the same order that appear in the objects sequence. The PostObject1/PostObject2 operations are parallelized, but the TurnObjectIntoSpecialObject/TurnObjectIntoAnotherObject operations are not. If you want to parallelize these too, then you can feed the Parallel.ForEachAsync with the objects, and do the projection inside the parallel loop.
In case of errors, only the first exception will be propagated. If you want to
propagate all the errors, you can find solutions here.
Typical situation: Fast producer, slow consumer, need to slow producer down.
Sample code that doesn't work as I expected (explained below):
// I assumed this block will behave like BlockingCollection, but it doesn't
var bb = new BufferBlock<int>(new DataflowBlockOptions {
BoundedCapacity = 3, // looks like this does nothing
});
// fast producer
int dataSource = -1;
var producer = Task.Run(() => {
while (dataSource < 10) {
var message = ++dataSource;
bb.Post(message);
Console.WriteLine($"Posted: {message}");
}
Console.WriteLine("Calling .Complete() on buffer block");
bb.Complete();
});
// slow consumer
var ab = new ActionBlock<int>(i => {
Thread.Sleep(500);
Console.WriteLine($"Received: {i}");
}, new ExecutionDataflowBlockOptions {
MaxDegreeOfParallelism = 2,
});
bb.LinkTo(ab);
ab.Completion.Wait();
How I thought this code would work, but it doesn't:
BufferBlock bb is the blocking queue with capacity of 3. Once capacity reached, producer should not be able to .Post() to it, until there's a vacant slot.
Doesn't work like that. bb seems to happily accept any number of messages.
producer is a task that quickly Posts messages. Once all messages have been posted, the call to bb.Complete() should propagate through the pipeline and signal shutdown once all messages have been processed. Hence waiting ab.Completion.Wait(); at the end.
Doesn't work either. As soon as .Complete() is called, action block ab won't receive any more messages.
Can be done with a BlockingCollection, which I thought in TPL Dataflow (TDF) world BufferBlock was the equivalent of. I guess I'm misunderstanding how backpressure is supposed to work in TPL Dataflow.
So where's the catch? How to run this pipeline, not allowing more than 3 messages in the buffer bb, and wait for its completion?
PS: I found this gist (https://gist.github.com/mnadel/df2ec09fe7eae9ba8938) where it's suggested to maintain a semaphore to block writing to BufferBlock. I thought this was "built-in".
Update after accepting an answer:
Updates after accepting the answer:
If you're looking at this question, you need to remember that ActionBlock also has its own input buffer.
That's for one. Then you also need to realize, that because all blocks have their own input buffers you don't need the BufferBlock for what you might think its name implied. A BufferBlock is more like a utility block for more complex architectures or like a balance loading block. But it's not a backpressure buffer.
Completion propagation needs to be dfined at link level explicitly.
When calling .LinkTo() need to explicitly pass new DataflowLinkOptions {PropagateCompletion = true} as the 2nd argument.
To introduce back pressure you need use SendAsync when you send items into the block. This allows your producer to wait for the block to be ready for the item. Something like this is what you're looking for:
class Program
{
static async Task Main()
{
var options = new ExecutionDataflowBlockOptions()
{
BoundedCapacity = 3
};
var block = new ActionBlock<int>(async i =>
{
await Task.Delay(100);
Console.WriteLine(i);
}, options);
//Producer
foreach (var i in Enumerable.Range(0, 10))
{
await block.SendAsync(i);
}
block.Complete();
await block.Completion;
}
}
If you change this to use Post and print the result of the Post you'll see that many items fail to be passed to the block:
class Program
{
static async Task Main()
{
var options = new ExecutionDataflowBlockOptions()
{
BoundedCapacity = 1
};
var block = new ActionBlock<int>(async i =>
{
await Task.Delay(1000);
Console.WriteLine(i);
}, options);
//Producer
foreach (var i in Enumerable.Range(0, 10))
{
var result = block.Post(i);
Console.WriteLine(result);
}
block.Complete();
await block.Completion;
}
}
Output:
True
False
False
False
False
False
False
False
False
False
0
With the guidance from JSteward's answer, I came up with the following code.
It produces (reads etc.) new items concurrently with processing said items, maintaining a read-ahead buffer.
The completion signal is sent to the head of the chain when the "producer" has no more items.
The program also awaits the completion of the whole chain before terminating.
static async Task Main() {
string Time() => $"{DateTime.Now:hh:mm:ss.fff}";
// the buffer is added to the chain just for demonstration purposes
// the chain would work fine using just the built-in input buffer
// of the `action` block.
var buffer = new BufferBlock<int>(new DataflowBlockOptions {BoundedCapacity = 3});
var action = new ActionBlock<int>(async i =>
{
Console.WriteLine($"[{Time()}]: Processing: {i}");
await Task.Delay(500);
}, new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = 2, BoundedCapacity = 2});
// it's necessary to set `PropagateCompletion` property
buffer.LinkTo(action, new DataflowLinkOptions {PropagateCompletion = true});
//Producer
foreach (var i in Enumerable.Range(0, 10))
{
Console.WriteLine($"[{Time()}]: Ready to send: {i}");
await buffer.SendAsync(i);
Console.WriteLine($"[{Time()}]: Sent: {i}");
}
// we call `.Complete()` on the head of the chain and it's propagated forward
buffer.Complete();
await action.Completion;
}
I'm trying to do a parallel SqlBulkCopy to multiple targets over WAN, many of which may be having slow connections and/or connection cutoffs; their connection speed varies from 2 to 50 mbits download, and I am sending from a connection with 1000 mbit upload; a lot of the targets need multiple retries to correctly finish.
I'm currently using a Parallel.ForEach on the GetConsumingEnumerable() of a BlockingCollection (queue); however I either stumbled upon some bug, or I am having problems fully understanding its purpose, or simply got something wrong..
The code never calls the CompleteAdding() method of the blockingcollection,
it seems that somewhere in the parallel-foreach-loop some of the targets get lost.
Even if there are different approaches to this, and disregarding the kind of work it is doing in the loop, the blockingcollection shouldn't behave the way it does in this example, should it?
In the foreach-loop, I do the work, and add the target to a results-collection in case it completed successfully, or re-add the target to the BlockingCollection in case of an error until the target reached the max retries threshold; at that point I add it to the results-collection.
In an additional Task, I loop until the count of the results-collection equals the initial count of the targets; then I do the CompleteAdding() on the blocking collection.
I already tried using a locking object for the operations on the results-collection (using a List<int> instead) and the queue, with no luck, but that shouldn't be necessary anyways. I also tried adding the retries to a separate collection, and re-adding those to the BlockingCollection in a different Task instead of in the parallel.foreach.
Just for fun I also tried compiling with .NET from 4.5 to 4.8, and different C# language versions.
Here is a simplified example:
List<int> targets = new List<int>();
for (int i = 0; i < 200; i++)
{
targets.Add(0);
}
BlockingCollection<int> queue = new BlockingCollection<int>(new ConcurrentQueue<int>());
ConcurrentBag<int> results = new ConcurrentBag<int>();
targets.ForEach(f => queue.Add(f));
// Bulkcopy in die Filialen:
Task.Run(() =>
{
while (results.Count < targets.Count)
{
Thread.Sleep(2000);
Console.WriteLine($"Completed: {results.Count} / {targets.Count} | queue: {queue.Count}");
}
queue.CompleteAdding();
});
int MAX_RETRIES = 10;
ParallelOptions options = new ParallelOptions { MaxDegreeOfParallelism = 50 };
Parallel.ForEach(queue.GetConsumingEnumerable(), options, target =>
{
try
{
// simulate a problem with the bulkcopy:
throw new Exception();
results.Add(target);
}
catch (Exception)
{
if (target < MAX_RETRIES)
{
target++;
if (!queue.TryAdd(target))
Console.WriteLine($"{target.ToString("D3")}: Error, can't add to queue!");
}
else
{
results.Add(target);
Console.WriteLine($"Aborted after {target + 1} tries | {results.Count} / {targets.Count} items finished.");
}
}
});
I expected the count of the results-collection to be the exact count of the targets-list in the end, but it seems to never reach that number, which results in the BlockingCollection never being marked as completed, so the code never finishes.
I really don't understand why not all of the targets get added to the results-collection eventually! The added count always varies, and is mostly just shy of the expected final count.
EDIT: I removed the retry-part, and replaced the ConcurrentBag with a simple int-counter, and it still doesn't work most of the time:
List<int> targets = new List<int>();
for (int i = 0; i < 500; i++)
targets.Add(0);
BlockingCollection<int> queue = new BlockingCollection<int>(new ConcurrentQueue<int>());
//ConcurrentBag<int> results = new ConcurrentBag<int>();
int completed = 0;
targets.ForEach(f => queue.Add(f));
var thread = new Thread(() =>
{
while (completed < targets.Count)
{
Thread.Sleep(2000);
Console.WriteLine($"Completed: {completed} / {targets.Count} | queue: {queue.Count}");
}
queue.CompleteAdding();
});
thread.Start();
ParallelOptions options = new ParallelOptions { MaxDegreeOfParallelism = 4 };
Parallel.ForEach(queue.GetConsumingEnumerable(), options, target =>
{
Interlocked.Increment(ref completed);
});
Sorry, found the answer: the default partitioner used by blockingcollection and parallel foreach is chunking and buffering, which results in the foreach loop to forever wait for enough items for the next chunk.. for me, it sat there for a whole night, without processing the last few items!
So, instead of:
ParallelOptions options = new ParallelOptions { MaxDegreeOfParallelism = 4 };
Parallel.ForEach(queue.GetConsumingEnumerable(), options, target =>
{
Interlocked.Increment(ref completed);
});
you have to use:
var partitioner = Partitioner.Create(queue.GetConsumingEnumerable(), EnumerablePartitionerOptions.NoBuffering);
ParallelOptions options = new ParallelOptions { MaxDegreeOfParallelism = 4 };
Parallel.ForEach(partitioner, options, target =>
{
Interlocked.Increment(ref completed);
});
Parallel.ForEach is meant for data parallelism (ie processing 100K rows using all 8 cores), not concurrent operations. This is essentially a pub/sub and async problem, if not a pipeline problem. There's nothing for the CPU to do in this case, just start the async operations and wait for them to complete.
.NET handles this since .NET 4.5 through the Dataflow classes and lately, the lower-level System.Threading.Channel namespace.
In its simplest form, you can create an ActionBlock<> that takes a buffer and target connection and publishes the data. Let's say you use this method to send the data to a server :
async Task MyBulkCopyMethod(string connectionString,DataTable data)
{
using(var bcp=new SqlBulkCopy(connectionString))
{
//Set up mappings etc.
//....
await bcp.WriteToServerAsync(data);
}
}
You can use this with an ActionBlock class with a configured degree of parallelism. Dataflow classes like ActionBlock have their own input, and where appropriate, output buffers, so there's no need to create a separate queue :
class DataMessage
{
public string Connection{get;set;}
public DataTable Data {get;set;}
}
...
var options=new ExecutionDataflowBlockOptions {
MaxDegreeOfParallelism = 50,
BoundedCapacity = 8
};
var block=new ActionBlock<DataMessage>(msg=>MyBulkCopyMethod(msg.Connection,msg.Data, options);
We can start posting messages to the block now. By setting the capacity to 8 we ensure the input buffer won't get filled with large messages if the block is too slow. MaxDegreeOfParallelism controls how may operations run concurrently. Let's say we want to send the same data to many servers :
var data=.....;
var servers=new[]{connString1, connString2,....};
var messages= from sv in servers
select new DataMessage{ ConnectionString=sv,Table=data};
foreach(var msg in messages)
{
await block.SendAsync(msg);
}
//Tell the block we are done
block.Complete();
//Await for all messages to finish processing
await block.Completion;
Retries
One possibility for retries is to use a retry loop in the worker function. A better idea would be to use a different block and post failed messages there.
var block=new ActionBlock<DataMessage>(async msg=> {
try {
await MyBulkCopyMethod(msg.Connection,msg.Data, options);
}
catch(SqlException exc) when (some retry condition)
{
//Post without awaiting
retryBlock.Post(msg);
});
When the original block completes we want to tell the retry block to complete as well, no matter what :
block.Completion.ContinueWith(_=>retryBlock.Complete());
Now we can await the retryBlock to complete.
That block could have a smaller DOP and perhaps a delay between attempts :
var retryOptions=new ExecutionDataflowBlockOptions {
MaxDegreeOfParallelism = 5
};
var retryBlock=new ActionBlock<DataMessage>(async msg=>{
await Task.Delay(1000);
try {
await MyBulkCopyMethod(msg.Connection,msg.Data, options);
}
catch (Exception ....)
{
...
}
});
This pattern can be repeated to create multiple levels of retry, or different conditions. It can also be used to create different priority workers by giving a larger DOP to high priority workers, or a larger delay to low priority workers
I am learning about the TPL Dataflow Library. So far it's exactly what I was looking for.
I've created a simple class (below) that performs the following functions
Upon execution of ImportPropertiesForBranch I go to a 3rd party api and get a list of properties
A xml list is returned and deserialized into an array of property data (id, api endpoint, lastupdated). There are around 400+ properties (as in houses).
I then use a Parallel.For to SendAsync the property data into my propertyBufferBlock
The propertyBufferBlock is linked to a propertyXmlBlock (which itself is a TransformBlock).
The propertyXmlBlock then (asynchronously) goes back to the API (using the api endpoint supplied in the property data) and fetches the property xml for deserialization.
Once the xml is returned and becomes available, we can then deserialize
Later, I'll add more TransformBlocks to persist it to a data store.
So my questions are;
Are there any potential bottlenecks or areas of the code that could be troublesome? I'm aware that I've not included any error handling or cancellation (this is to come).
Is it ok to await async calls inside a TransformBlock or is this a
bottleneck?
Although the code works , I am worried about the buffering and asyncronsity of the Parallel.For, BufferBlock and async in the TransformBlock. I'm not sure its the best way and I maybe mixing up some concepts.
Any guidance, improvemets and pitfall advice welcomed.
using System.Diagnostics;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;
using My.Interfaces;
using My.XmlService.Models;
namespace My.ImportService
{
public class ImportService
{
private readonly IApiService _apiService;
private readonly IXmlService _xmlService;
private readonly IRepositoryService _repositoryService;
public ImportService(IApiService apiService,
IXmlService xmlService,
IRepositoryService repositoryService)
{
_apiService = apiService;
_xmlService = xmlService;
_repositoryService = repositoryService;
ConstructPipeline();
}
private BufferBlock<propertiesProperty> propertyBufferBlock;
private TransformBlock<propertiesProperty, string> propertyXmlBlock;
private TransformBlock<string, propertyType> propertyDeserializeBlock;
private ActionBlock<propertyType> propertyCompleteBlock;
public async Task<bool> ImportPropertiesForBranch(string branchName, int branchUrlId)
{
var propertyListXml = await _apiService.GetPropertyListAsync(branchUrlId);
if (string.IsNullOrEmpty(propertyListXml))
return false;
var properties = _xmlService.DeserializePropertyList(propertyListXml);
if (properties?.property == null || properties.property.Length == 0)
return false;
// limited to the first 20 for testing
Parallel.For(0, 20,
new ParallelOptions {MaxDegreeOfParallelism = 3},
i => propertyBufferBlock.SendAsync(properties.property[i]));
propertyBufferBlock.Complete();
await propertyCompleteBlock.Completion;
return true;
}
private void ConstructPipeline()
{
propertyBufferBlock = GetPropertyBuffer();
propertyXmlBlock = GetPropertyXmlBlock();
propertyDeserializeBlock = GetPropertyDeserializeBlock();
propertyCompleteBlock = GetPropertyCompleteBlock();
propertyBufferBlock.LinkTo(
propertyXmlBlock,
new DataflowLinkOptions {PropagateCompletion = true});
propertyXmlBlock.LinkTo(
propertyDeserializeBlock,
new DataflowLinkOptions {PropagateCompletion = true});
propertyDeserializeBlock.LinkTo(
propertyCompleteBlock,
new DataflowLinkOptions {PropagateCompletion = true});
}
private BufferBlock<propertiesProperty> GetPropertyBuffer()
{
return new BufferBlock<propertiesProperty>();
}
private TransformBlock<propertiesProperty, string> GetPropertyXmlBlock()
{
return new TransformBlock<propertiesProperty, string>(async propertiesProperty =>
{
Debug.WriteLine($"getting xml {propertiesProperty.prop_id}");
var propertyXml = await _apiService.GetXmlAsStringAsync(propertiesProperty.url);
return propertyXml;
},
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 1,
BoundedCapacity = 2
});
}
private TransformBlock<string, propertyType> GetPropertyDeserializeBlock()
{
return new TransformBlock<string, propertyType>(xmlAsString =>
{
Debug.WriteLine($"deserializing");
var propertyType = _xmlService.DeserializeProperty(xmlAsString);
return propertyType;
},
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 1,
BoundedCapacity = 2
});
}
private ActionBlock<propertyType> GetPropertyCompleteBlock()
{
return new ActionBlock<propertyType>(propertyType =>
{
Debug.WriteLine($"complete {propertyType.id}");
Debug.WriteLine(propertyType.address.display);
},
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 1,
BoundedCapacity = 2
});
}
}
}
You actually do some stuff in a wrong way:
i => propertyBufferBlock.SendAsync(properties.property[i])
You need to await method, otherwise you're creating too many simultaneous tasks.
Also this line:
MaxDegreeOfParallelism = 1
will limit the execution of your blocks to consequent execution, which can degrade your performance.
As you said in comments, you switched to synchronous method Post and have limited capacity of the blocks by setting the BoundedCapacity. This variant should be used with caution, as you need to check the return value of it, which stating, has the message had been accepted or not.
As for your worries of awaiting the async methods inside the blocks - it absolutely ok, and should be done as in other cases of async method usage.
Are there any potential bottlenecks or areas of the code that could be troublesome?
In general your approach looks good and the potential bottle neck is that you are limiting parallel processing of your blocks with MaxDegreeOfParallelism = 1. Based on the description of the problem each item can be processed independently of others and that's why you can process multiple items at a time.
Is it ok to await async calls inside a TransformBlock or is this a bottleneck?
It is perfectly fine because TPL DataFlow supports async operations.
Although the code works , I am worried about the buffering and asyncronsity of the Parallel.For, BufferBlock and async in the TransformBlock. I'm not sure its the best way and I maybe mixing up some concepts.
One, potential problem in your code that could make you shoot yourself in the foot is calling async method in Parallel.For and then calling propertyBufferBlock.Complete();. The problem here is that Parallel.For does not support async actions and the way you invoke it will call propertyBufferBlock.SendAsync and move on before returned task is completed. Which means that by the time Parallel.For exits some operations might still be in running state and items are not yet added to buffer block. And if you will then call propertyBufferBlock.Complete(); those pending items will throw exception and items won't be added to processing. You will get unobserved exception.
You could use ForEachAsync form this blog post to ensure that all items are added to the block before completing the block. But if you are still limitting processing to 1 operation you can simply add items one at a time. I am not sure how propertyBufferBlock.SendAsync is implemented, but it can be that in will internally restrict to adding one item at a time so parallel adding would not make any sense.
I have a problem connecting BroadcastBlock(s) to BatchBlocks. The scenario is that the sources are BroadcastBlocks, and recipients are BatchBlocks.
In the simplified code below, only one of the supplemental action blocks executes. I even set the batchSize for each BatchBlock to 1 to illustrate the problem.
Setting Greedy to "true" would make the 2 ActionBlocks execute, but that's not what I want as it will cause the BatchBlock to proceed even if it's not complete yet. Any ideas?
class Program
{
static void Main(string[] args)
{
// My possible sources are BroadcastBlocks. Could be more
var source1 = new BroadcastBlock<int>(z => z);
// batch 1
// can be many potential sources, one for now
// I want all sources to arrive first before proceeding
var batch1 = new BatchBlock<int>(1, new GroupingDataflowBlockOptions() { Greedy = false });
var batch1Action = new ActionBlock<int[]>(arr =>
{
// this does not run sometimes
Console.WriteLine("Received from batch 1 block!");
foreach (var item in arr)
{
Console.WriteLine("Received {0}", item);
}
});
batch1.LinkTo(batch1Action, new DataflowLinkOptions() { PropagateCompletion = true });
// batch 2
// can be many potential sources, one for now
// I want all sources to arrive first before proceeding
var batch2 = new BatchBlock<int>(1, new GroupingDataflowBlockOptions() { Greedy = false });
var batch2Action = new ActionBlock<int[]>(arr =>
{
// this does not run sometimes
Console.WriteLine("Received from batch 2 block!");
foreach (var item in arr)
{
Console.WriteLine("Received {0}", item);
}
});
batch2.LinkTo(batch2Action, new DataflowLinkOptions() { PropagateCompletion = true });
// connect source(s)
source1.LinkTo(batch1, new DataflowLinkOptions() { PropagateCompletion = true });
source1.LinkTo(batch2, new DataflowLinkOptions() { PropagateCompletion = true });
// fire
source1.SendAsync(3);
Task.WaitAll(new Task[] { batch1Action.Completion, batch2Action.Completion }); ;
Console.ReadLine();
}
}
It seems that there is a flaw in the internal mechanism of the TPL Dataflow library that supports the non-greedy functionality. What happens is that a BatchBlock configured as non-greedy will Postpone all offered messages from linked blocks, instead of accepting them. It keeps an internal queue with the messages it has postponed, and when their number reaches its BatchSize configuration it attempts to consume the postponed messages, and if successful it propagates them downstream as expected. The problem is that the source blocks like the BroadcastBlock and the BufferBlock will stop offering more messages to a block that has postponed a previously offered message, until it has consumed this single message. The combination of these two behaviors results to a deadlock. No progress forward can be made, because the BatchBlock waits for more messages to be offered before consuming the postponed, and the BroadcastBlock waits for the postponed messages to be consumed before offering more messages...
This scenario occurs only with BatchSize larger than one (which is the typical configuration for this block).
Here is a demonstration of this problem. As source it is used the more common BufferBlock instead of a BroadcastBlock. 10 messages are posted to a three-block pipeline, and the expected behavior is for the messages to flow through the pipeline to the last block. In reality nothing happens, and all messages remain stuck in the first block.
using System;
using System.Threading;
using System.Threading.Tasks.Dataflow;
public static class Program
{
static void Main(string[] args)
{
var bufferBlock = new BufferBlock<int>();
var batchBlock = new BatchBlock<int>(batchSize: 2,
new GroupingDataflowBlockOptions() { Greedy = false });
var actionBlock = new ActionBlock<int[]>(batch =>
Console.WriteLine($"Received: {String.Join(", ", batch)}"));
bufferBlock.LinkTo(batchBlock,
new DataflowLinkOptions() { PropagateCompletion = true });
batchBlock.LinkTo(actionBlock,
new DataflowLinkOptions() { PropagateCompletion = true });
for (int i = 1; i <= 10; i++)
{
var accepted = bufferBlock.Post(i);
Console.WriteLine(
$"bufferBlock.Post({i}) {(accepted ? "accepted" : "rejected")}");
Thread.Sleep(100);
}
bufferBlock.Complete();
actionBlock.Completion.Wait(millisecondsTimeout: 1000);
Console.WriteLine();
Console.WriteLine($"bufferBlock.Completion: {bufferBlock.Completion.Status}");
Console.WriteLine($"batchBlock.Completion: {batchBlock.Completion.Status}");
Console.WriteLine($"actionBlock.Completion: {actionBlock.Completion.Status}");
Console.WriteLine($"bufferBlock.Count: {bufferBlock.Count}");
}
}
Output:
bufferBlock.Post(1) accepted
bufferBlock.Post(2) accepted
bufferBlock.Post(3) accepted
bufferBlock.Post(4) accepted
bufferBlock.Post(5) accepted
bufferBlock.Post(6) accepted
bufferBlock.Post(7) accepted
bufferBlock.Post(8) accepted
bufferBlock.Post(9) accepted
bufferBlock.Post(10) accepted
bufferBlock.Completion: WaitingForActivation
batchBlock.Completion: WaitingForActivation
actionBlock.Completion: WaitingForActivation
bufferBlock.Count: 10
My guess is that the internal offer-consume-reserve-release mechanism has been tuned for maximum efficiency at supporting the BoundedCapacity functionality, which is critical for many applications, and the rarely used Greedy = false functionality was left non-thoroughly tested.
The good news is that in your case you don't really need to set the Greedy to false. A BatchBlock in the default greedy mode will not propagate less messages than the configured BatchSize, unless it has been marked as completed and it propagates any leftover messages, or you manually call its TriggerBatch method at any arbitrary moment. The intended usage of the non-greedy configuration is for preventing resource starvation in complex graph scenarios, with multiple dependencies between blocks.