"bounded" BatchBlock => ActionBlock. How to complete the proper way? - c#

I'm trying to use a bounded batchblock linked to an action block.
I know when the feeding of items in the batchblock end and I want to trigger a completion chain.
The problem is: if my BatchBlock<T> is of a given BoundedCapacity I won't get all my items fired in the action block.
Here is a sample of my problem, it should (well in my understanding of TPL dataflow...) print 0 to 124 but it ends up printing 0 to 99.
There must be something I'm missing... Maybe BoundedCapacity means "drop items when queue count is over xxx..." if so how can I achieve a guaranteed maximum memory consumption?
using System;
using System.Collections.Generic;
using System.Threading;
using System.Threading.Tasks.Dataflow;
namespace ConsoleApplication
{
class Program
{
static void Main(string[] args)
{
int itemsCount = 125;
List<int> ints = new List<int>(itemsCount);
for (int i = 0; i < itemsCount; i++)
ints.Add(i);
BatchBlock<int> batchBlock = new BatchBlock<int>(50,new GroupingDataflowBlockOptions(){BoundedCapacity = 100});
ActionBlock<int[]> actionBlock = new ActionBlock<int[]>(intsBatch =>
{
Thread.Sleep(1000);
foreach (int i in intsBatch)
Console.WriteLine(i);
});
batchBlock.LinkTo(actionBlock, new DataflowLinkOptions() { PropagateCompletion = true });
// feed the batch block
foreach (int i in ints)
batchBlock.Post(i);
// Don't know how to end the proper way... Meaning it should display 0 to 124 and not 0 to 99
batchBlock.Complete();
batchBlock.TriggerBatch();
actionBlock.Completion.Wait();
}
}
}

Post on a block doesn't always succeed. It tries to post a message to the block but if the BoundedCapacity was reached it will fail and return false.
What you can do is use SendAsync instead which returns an awaitable task. If the block has room for your message it completes asynchronously. If it doesn't then the block returns a task that will complete when it does have room to accept a new message. You can await that task and throttle your insertions:
async Task MainAsync()
{
var ints = Enumerable.Range(0, 125).ToList();
var batchBlock = new BatchBlock<int>(50, new GroupingDataflowBlockOptions { BoundedCapacity = 100 });
var actionBlock = new ActionBlock<int[]>(intsBatch =>
{
Thread.Sleep(1000);
foreach (var i in intsBatch)
Console.WriteLine(i);
});
batchBlock.LinkTo(actionBlock, new DataflowLinkOptions { PropagateCompletion = true });
foreach (var i in ints)
await batchBlock.SendAsync(i); // wait synchronously for the block to accept.
batchBlock.Complete();
await actionBlock.Completion;
}

Related

Asynchronous Task, video buffering

I am trying to understand Tasks in C# but still having some problems. I am trying to create an application containing video. The main purpose is to read the video from a file (I am using Emgu.CV) and send it via TCP/IP for process in a board and then back in a stream (real-time) way. Firstly, I did it in serial. So, reading a Bitmap, sending-receiving from board, and plotting. But reading the bitmaps and plotting them takes too much time. I would like to have a Transmit, Receive FIFO Buffers that save the video frames, and a different task that does the job of sending receiving each frame. So I would like to do it in parallel. I thought I should create 3 Tasks:
tasks.Add(Task.Run(() => Video_load(video_path)));
tasks.Add(Task.Run(() => Video_Send_Recv(video_path)));
tasks.Add(Task.Run(() => VideoDisp_hw(32)));
Which I would like to run "parallel". What type of object should I use? A concurrent queue? BufferBlock? or just a list?
Thanks for the advices! I would like to ask something. I am trying to create a simple console program with 2 TPL blocks. 1 Block would be Transform block (taking a message i.e. "start" ) and loading data to a List and another block would be ActionBlock (just reading the data from the list and printing them). Here is the code below:
namespace TPL_Dataflow
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Hello World!");
Random randn = new Random();
var loadData = new TransformBlock<string, List<int>>(async sample_string =>
{
List<int> input_data = new List<int>();
int cnt = 0;
if (sample_string == "start")
{
Console.WriteLine("Inside loadData");
while (cnt < 16)
{
input_data.Add(randn.Next(1, 255));
await Task.Delay(1500);
Console.WriteLine("Cnt");
cnt++;
}
}
else
{
Console.WriteLine("Not started yet");
}
return input_data;
});
var PrintData = new ActionBlock<List<int>>(async input_data =>
{
while(input_data.Count > 0)
{
Console.WriteLine("output Data = " + input_data.First());
await Task.Delay(1000);
input_data.RemoveAt(0);
}
});
var linkOptions = new DataflowLinkOptions { PropagateCompletion = true };
loadData.LinkTo(PrintData, input_data => input_data.Count() >0 );
//loadData.LinkTo(PrintData, linkOptions);
loadData.SendAsync("start");
loadData.Complete();
PrintData.Completion.Wait();
}
}
}
But it seems to work in serial way.. What am I doing wrong? I tried to do the while loops async. I would like to do the 2 things in parallel. When data available from the List then plotted.
You could use a TransformManyBlock<string, int> as the producer block, and an ActionBlock<int> as the consumer block. The TransformManyBlock would be instantiated with the constructor that accepts a Func<string, IEnumerable<int>> delegate, and passed an iterator method (the Produce method in the example below) that yields values one by one:
Random random = new Random();
var producer = new TransformManyBlock<string, int>(Produce);
IEnumerable<int> Produce(string message)
{
if (message == "start")
{
int cnt = 0;
while (cnt < 16)
{
int value;
lock (random) value = random.Next(1, 255);
Console.WriteLine($"Producing #{value}");
yield return value;
Thread.Sleep(1500);
cnt++;
}
}
else
{
yield break;
}
}
var consumer = new ActionBlock<int>(async value =>
{
Console.WriteLine($"Received: {value}");
await Task.Delay(1000);
});
producer.LinkTo(consumer, new() { PropagateCompletion = true });
producer.Post("start");
producer.Complete();
consumer.Completion.Wait();
Unfortunately the producer has to block the worker thread during the idle period between yielding each value (Thread.Sleep(1500);), because the TransformManyBlock currently does not have a constructor that accepts a Func<string, IAsyncEnumerable<int>>. This will be probably fixed in the next release of the TPL Dataflow library. You could track this GitHub issue, to be informed about when this feature will be released.
Alternative solution: Instead of linking explicitly the producer and the consumer, you could keep them unlinked, and send manually the values produced by the producer to the consumer. In this case both blocks would be ActionBlocks:
Random random = new Random();
var consumer = new ActionBlock<int>(async value =>
{
Console.WriteLine($"Received: {value}");
await Task.Delay(1000);
});
var producer = new ActionBlock<string>(async message =>
{
if (message == "start")
{
int cnt = 0;
while (cnt < 16)
{
int value;
lock (random) value = random.Next(1, 255);
Console.WriteLine($"Producing #{value}");
var accepted = await consumer.SendAsync(value);
if (!accepted) break; // The consumer has failed
await Task.Delay(1500);
cnt++;
}
}
});
PropagateCompletion(producer, consumer);
producer.Post("start");
producer.Complete();
consumer.Completion.Wait();
async void PropagateCompletion(IDataflowBlock source, IDataflowBlock target)
{
try { await source.Completion.ConfigureAwait(false); } catch { }
var ex = source.Completion.IsFaulted ? source.Completion.Exception : null;
if (ex != null) target.Fault(ex); else target.Complete();
}
The main difficulty with this approach is how to propagate the completion of the producer to the consumer, so that eventually both blocks are completed. Obviously you can't use the new DataflowLinkOptions { PropagateCompletion = true } configuration, since the blocks are not linked explicitly. You also can't Complete manually the consumer, because in this case it would stop prematurely accepting values from the producer. The solution to this problem is the PropagateCompletion method shown in the above example.

Dataflow blocks when some parallel process does a heavy job

I'm trying to understand TPL Dataflow.
I have two blocks inputBlock och nextBlock.
inputBlock using MaxDegreeOfParallelism = 2.
I have this situation that it can take diffrent time to parallell jobs to finish. I do not want the flow of data stops due some parallell job takes long time to finish.
I simply want each parallell job take one item from the queue and process it and then pass it to next block.
I do never reach nextBlock when one of the parallel job in the first block "inputBlock" goes to sleep or do a heavy job.
internal class Program
{
private static bool _sleep = true;
private static void Main(string[] args)
{
var inputBlock = new TransformBlock<string, string>(
x =>
{
if (_sleep)
{
_sleep = false;
Console.WriteLine("First thread sleeping");
Thread.Sleep(5000000);
}
Console.WriteLine("Second thread running");
return x;
},
new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = 2}); //1
var nextBlock = new TransformBlock<string, string>(
x =>
{
Console.WriteLine(x);
return x;
}); //2
inputBlock.LinkTo(nextBlock, new DataflowLinkOptions {PropagateCompletion = true});
for (var i = 0; i < 100; i++)
{
input.Post(i.ToString());
}
input.Complete();
Console.ReadLine();
}
}
}
Using EnsureOrdered = false was the answer.
new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = 2, EnsureOrdered = false});

Multicast block TPL Dataflow

I need to multicast a object into multiple path's
producer
|
multicast
| |
Process1 Process2
| |
Writedb WriteFile
the broadcast block is not helping much, it only does the latest to both proces1, process 2 , if process 2 is running late then it wont be able to receive messages.
db writer and write file have different data.
Here is the following code snippet.
class Program
{
public static void Main()
{
var broadCastBlock = new BroadcastBlock<int>(i => i);
var transformBlock1 = new TransformBlock<int, string>(i =>
{
Console.WriteLine("1 transformblock called: {0}", i);
//Thread.Sleep(4);
return string.Format("1_ {0},", i);
});
var transformBlock2 = new TransformBlock<int, string>(i =>
{
Console.WriteLine("2 transformblock called: {0}", i);
Thread.Sleep(100);
return string.Format("2_ {0},", i);
});
var processorBlockT1 = new ActionBlock<string>(i => Console.WriteLine("processBlockT1 {0}", i));
var processorBlockT2 = new ActionBlock<string>(i => Console.WriteLine("processBlockT2 {0}", i));
//Linking
broadCastBlock.LinkTo(transformBlock1, new DataflowLinkOptions { PropagateCompletion = true });
broadCastBlock.LinkTo(transformBlock2, new DataflowLinkOptions { PropagateCompletion = true });
transformBlock1.LinkTo(processorBlockT1, new DataflowLinkOptions { PropagateCompletion = true });
transformBlock2.LinkTo(processorBlockT2, new DataflowLinkOptions { PropagateCompletion = true });
const int numElements = 100;
for (int i = 1; i <= numElements; i++)
{
broadCastBlock.SendAsync(i);
}
//completion handling
broadCastBlock.Completion.ContinueWith(x =>
{
Console.WriteLine("Braodcast block Completed");
transformBlock1.Complete();
transformBlock2.Complete();
Task.WhenAll(transformBlock1.Completion, transformBlock2.Completion).ContinueWith(_ =>
{
processorBlockT1.Complete();
processorBlockT2.Complete();
});
});
transformBlock1.Completion.ContinueWith(x => Console.WriteLine("Transform1 completed"));
transformBlock2.Completion.ContinueWith(x => Console.WriteLine("Transform2 completed"));
processorBlockT1.Completion.ContinueWith(x => Console.WriteLine("processblockT1 completed"));
processorBlockT2.Completion.ContinueWith(x => Console.WriteLine("processblockT2 completed"));
//mark completion
broadCastBlock.Complete();
Task.WhenAll(processorBlockT1.Completion, processorBlockT2.Completion).ContinueWith(_ => Console.WriteLine("completed both tasks")).Wait();
Console.WriteLine("Finished");
Console.ReadLine();
}
}
What is the best method of a guaranteed delivery by broadcast. i.e., a multicast.
should I just stick in two buffers at both ends and then consume it so that the buffers always collect what ever is coming in and then the process might take some time to process all of them?
The BroadcastBlock guarantees that all messages will be offered to all linked blocks. So it is exactly what you need. What you should fix though is the way you feed the BroadcastBlock with messages:
for (int i = 1; i <= numElements; i++)
{
broadCastBlock.SendAsync(i); // Don't do this!
}
The SendAsync method is supposed to be awaited. You should never have more than one pending SendAsync operations targeting the same block. Doing so not only breaks all guarantees about the order of the received messages, but it is also extremely memory-inefficient. The whole point of using bounded blocks is for controlling the memory usage by limiting the size of the internal buffers of the blocks. By issuing multiple un-awaited SendAsync commands you circumvent this self-imposed limitation by creating an external dynamic buffer of incomplete Tasks, with each task weighing hundreds of bytes, for propagating messages having just a fraction of this weight. These messages could be much more efficiently buffered internally, by not making the blocks bounded in the first place.
for (int i = 1; i <= numElements; i++)
{
await broadCastBlock.SendAsync(i); // Now it's OK
}

Why TPL Dataflow block.LinkTo does not give any output?

I am quite new to the topic TPL Dataflow. In the book Concurrency in C# I tested the following example. I can't figure out why there's no output which should be 2*2-2=2;
static void Main(string[] args)
{
//Task tt = test();
Task tt = test1();
Console.ReadLine();
}
static async Task test1()
{
try
{
var multiplyBlock = new TransformBlock<int, int>(item =>
{
if (item == 1)
throw new InvalidOperationException("Blech.");
return item * 2;
});
var subtractBlock = new TransformBlock<int, int>(item => item - 2);
multiplyBlock.LinkTo(subtractBlock,
new DataflowLinkOptions { PropagateCompletion = true });
multiplyBlock.Post(2);
await subtractBlock.Completion;
int temp = subtractBlock.Receive();
Console.WriteLine(temp);
}
catch (AggregateException e)
{
// The exception is caught here.
foreach (var v in e.InnerExceptions)
{
Console.WriteLine(v.Message);
}
}
}
Update1: I tried another example. Still I did not use Block.Complete() but I thought when the first block's completed, the result is passed into the second block automatically.
private static async Task test3()
{
TransformManyBlock<int, int> tmb = new TransformManyBlock<int, int>((i) => { return new int[] {i, i + 1}; });
ActionBlock<int> ab = new ActionBlock<int>((i) => Console.WriteLine(i));
tmb.LinkTo(ab);
for (int i = 0; i < 4; i++)
{
tmb.Post(i);
}
//tmb.Complete();
await ab.Completion;
Console.WriteLine("Finished post");
}
This part of the code:
await subtractBlock.Completion;
int temp = subtractBlock.Receive();
is first (asynchronously) waiting for the subtraction block to complete, and then attempting to retrieve an output from the block.
There are two problems: the source block is never completed, and the code is attempting to retrieve output from a completed block. Once a block has completed, it will not produce any more data.
(I assume you're referring to the example in recipe 4.2, which will post 1, causing the exception, which completes the block in a faulted state).
So, you can fix this test by completing the source block (and the completion will propagate along the link to the subtractBlock automatically), and by reading the output before (asynchronously) waiting for subtractBlock to complete:
multiplyBlock.Complete();
int temp = subtractBlock.Receive();
await subtractBlock.Completion;

TransformBlock never completes

I'm trying to wrap my head around "completion" in TPL Dataflow blocks. In particular, the TransformBlock doesn't seem to ever complete. Why?
Sample program
My code calculates the square of all integers from 1 to 1000. I used a BufferBlock and a TransformBlock for that. Later in my code, I await completion of the TransformBlock. The block never actually completes though, and I don't understand why.
static void Main(string[] args)
{
var bufferBlock = new BufferBlock<int>();
var calculatorBlock = new TransformBlock<int, int>(i =>
{
Console.WriteLine("Calculating {0}²", i);
return (int)Math.Pow(i, 2);
}, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 8 });
using (bufferBlock.LinkTo(calculatorBlock, new DataflowLinkOptions { PropagateCompletion = true }))
{
foreach (var number in Enumerable.Range(1, 1000))
{
bufferBlock.Post(number);
}
bufferBlock.Complete();
// This line never completes
calculatorBlock.Completion.Wait();
// Unreachable code
IList<int> results;
if (calculatorBlock.TryReceiveAll(out results))
{
foreach (var result in results)
{
Console.WriteLine("x² = {0}", result);
}
}
}
}
At first I thought I created a deadlock situation, but that doesn't seem to be true. When I inspected the calculatorBlock.Completion task in the debugger, its Status property was set to WaitingForActivation. That was the moment when my brain blue screened.
The reason your pipeline hangs is that both BufferBlock and TransformBlock evidently don't complete until they emptied themselves of items (I guess that the desired behavior of IPropagatorBlocks although I haven't found documentation on it).
This can be verified with a more minimal example:
var bufferBlock = new BufferBlock<int>();
bufferBlock.Post(0);
bufferBlock.Complete();
bufferBlock.Completion.Wait();
This blocks indefinitely unless you add bufferBlock.Receive(); before completing.
If you remove the items from your pipeline before blocking by either your TryReceiveAll code block, connecting another ActionBlock to the pipeline, converting your TransformBlock to an ActionBlock or any other way this will no longer block.
About your specific solution, it seems that you don't need a BufferBlock or TransformBlock at all since blocks have an input queue for themselves and you don't use the return value of the TransformBlock. This could be achieved with just an ActionBlock:
var block = new ActionBlock<int>(
i =>
{
Console.WriteLine("Calculating {0}²", i);
Console.WriteLine("x² = {0}", (int)Math.Pow(i, 2));
},
new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = 8});
foreach (var number in Enumerable.Range(1, 1000))
{
block.Post(number);
}
block.Complete();
block.Completion.Wait();
I think I understand it now. An instance of TransformBlock is not considered "complete" until the following conditions are met:
TransformBlock.Complete() has been called
InputCount == 0 – the block has applied its transformation to every incoming element
OutputCount == 0 – all transformed elements have left the output buffer
In my program, there is no target block that is linked to the source TransformBlock, so the source block never gets to flush its output buffer.
As a workaround, I added a second BufferBlock that is used to store transformed elements.
static void Main(string[] args)
{
var inputBufferBlock = new BufferBlock<int>();
var calculatorBlock = new TransformBlock<int, int>(i =>
{
Console.WriteLine("Calculating {0}²", i);
return (int)Math.Pow(i, 2);
}, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 8 });
var outputBufferBlock = new BufferBlock<int>();
using (inputBufferBlock.LinkTo(calculatorBlock, new DataflowLinkOptions { PropagateCompletion = true }))
using (calculatorBlock.LinkTo(outputBufferBlock, new DataflowLinkOptions { PropagateCompletion = true }))
{
foreach (var number in Enumerable.Range(1, 1000))
{
inputBufferBlock.Post(number);
}
inputBufferBlock.Complete();
calculatorBlock.Completion.Wait();
IList<int> results;
if (outputBufferBlock.TryReceiveAll(out results))
{
foreach (var result in results)
{
Console.WriteLine("x² = {0}", result);
}
}
}
}
TransformBlock needs a ITargetBlock where he can post the transformation.
var writeCustomerBlock = new ActionBlock<int>(c => Console.WriteLine(c));
transformBlock.LinkTo(
writeCustomerBlock, new DataflowLinkOptions
{
PropagateCompletion = true
});
After this it completes.

Categories