Why TPL Dataflow block.LinkTo does not give any output?

Why TPL Dataflow block.LinkTo does not give any output? - c#

I am quite new to the topic TPL Dataflow. In the book Concurrency in C# I tested the following example. I can't figure out why there's no output which should be 2*2-2=2;
static void Main(string[] args)
{
//Task tt = test();
Task tt = test1();
Console.ReadLine();
}
static async Task test1()
{
try
{
var multiplyBlock = new TransformBlock<int, int>(item =>
{
if (item == 1)
throw new InvalidOperationException("Blech.");
return item * 2;
});
var subtractBlock = new TransformBlock<int, int>(item => item - 2);
multiplyBlock.LinkTo(subtractBlock,
new DataflowLinkOptions { PropagateCompletion = true });
multiplyBlock.Post(2);
await subtractBlock.Completion;
int temp = subtractBlock.Receive();
Console.WriteLine(temp);
}
catch (AggregateException e)
{
// The exception is caught here.
foreach (var v in e.InnerExceptions)
{
Console.WriteLine(v.Message);
}
}
}
Update1: I tried another example. Still I did not use Block.Complete() but I thought when the first block's completed, the result is passed into the second block automatically.
private static async Task test3()
{
TransformManyBlock<int, int> tmb = new TransformManyBlock<int, int>((i) => { return new int[] {i, i + 1}; });
ActionBlock<int> ab = new ActionBlock<int>((i) => Console.WriteLine(i));
tmb.LinkTo(ab);
for (int i = 0; i < 4; i++)
{
tmb.Post(i);
}
//tmb.Complete();
await ab.Completion;
Console.WriteLine("Finished post");
}

This part of the code:
await subtractBlock.Completion;
int temp = subtractBlock.Receive();
is first (asynchronously) waiting for the subtraction block to complete, and then attempting to retrieve an output from the block.
There are two problems: the source block is never completed, and the code is attempting to retrieve output from a completed block. Once a block has completed, it will not produce any more data.
(I assume you're referring to the example in recipe 4.2, which will post 1, causing the exception, which completes the block in a faulted state).
So, you can fix this test by completing the source block (and the completion will propagate along the link to the subtractBlock automatically), and by reading the output before (asynchronously) waiting for subtractBlock to complete:
multiplyBlock.Complete();
int temp = subtractBlock.Receive();
await subtractBlock.Completion;

Related

Asynchronous Task, video buffering

I am trying to understand Tasks in C# but still having some problems. I am trying to create an application containing video. The main purpose is to read the video from a file (I am using Emgu.CV) and send it via TCP/IP for process in a board and then back in a stream (real-time) way. Firstly, I did it in serial. So, reading a Bitmap, sending-receiving from board, and plotting. But reading the bitmaps and plotting them takes too much time. I would like to have a Transmit, Receive FIFO Buffers that save the video frames, and a different task that does the job of sending receiving each frame. So I would like to do it in parallel. I thought I should create 3 Tasks:
tasks.Add(Task.Run(() => Video_load(video_path)));
tasks.Add(Task.Run(() => Video_Send_Recv(video_path)));
tasks.Add(Task.Run(() => VideoDisp_hw(32)));
Which I would like to run "parallel". What type of object should I use? A concurrent queue? BufferBlock? or just a list?
Thanks for the advices! I would like to ask something. I am trying to create a simple console program with 2 TPL blocks. 1 Block would be Transform block (taking a message i.e. "start" ) and loading data to a List and another block would be ActionBlock (just reading the data from the list and printing them). Here is the code below:
namespace TPL_Dataflow
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Hello World!");
Random randn = new Random();
var loadData = new TransformBlock<string, List<int>>(async sample_string =>
{
List<int> input_data = new List<int>();
int cnt = 0;
if (sample_string == "start")
{
Console.WriteLine("Inside loadData");
while (cnt < 16)
{
input_data.Add(randn.Next(1, 255));
await Task.Delay(1500);
Console.WriteLine("Cnt");
cnt++;
}
}
else
{
Console.WriteLine("Not started yet");
}
return input_data;
});
var PrintData = new ActionBlock<List<int>>(async input_data =>
{
while(input_data.Count > 0)
{
Console.WriteLine("output Data = " + input_data.First());
await Task.Delay(1000);
input_data.RemoveAt(0);
}
});
var linkOptions = new DataflowLinkOptions { PropagateCompletion = true };
loadData.LinkTo(PrintData, input_data => input_data.Count() >0 );
//loadData.LinkTo(PrintData, linkOptions);
loadData.SendAsync("start");
loadData.Complete();
PrintData.Completion.Wait();
}
}
}
But it seems to work in serial way.. What am I doing wrong? I tried to do the while loops async. I would like to do the 2 things in parallel. When data available from the List then plotted.

You could use a TransformManyBlock<string, int> as the producer block, and an ActionBlock<int> as the consumer block. The TransformManyBlock would be instantiated with the constructor that accepts a Func<string, IEnumerable<int>> delegate, and passed an iterator method (the Produce method in the example below) that yields values one by one:
Random random = new Random();
var producer = new TransformManyBlock<string, int>(Produce);
IEnumerable<int> Produce(string message)
{
if (message == "start")
{
int cnt = 0;
while (cnt < 16)
{
int value;
lock (random) value = random.Next(1, 255);
Console.WriteLine($"Producing #{value}");
yield return value;
Thread.Sleep(1500);
cnt++;
}
}
else
{
yield break;
}
}
var consumer = new ActionBlock<int>(async value =>
{
Console.WriteLine($"Received: {value}");
await Task.Delay(1000);
});
producer.LinkTo(consumer, new() { PropagateCompletion = true });
producer.Post("start");
producer.Complete();
consumer.Completion.Wait();
Unfortunately the producer has to block the worker thread during the idle period between yielding each value (Thread.Sleep(1500);), because the TransformManyBlock currently does not have a constructor that accepts a Func<string, IAsyncEnumerable<int>>. This will be probably fixed in the next release of the TPL Dataflow library. You could track this GitHub issue, to be informed about when this feature will be released.
Alternative solution: Instead of linking explicitly the producer and the consumer, you could keep them unlinked, and send manually the values produced by the producer to the consumer. In this case both blocks would be ActionBlocks:
Random random = new Random();
var consumer = new ActionBlock<int>(async value =>
{
Console.WriteLine($"Received: {value}");
await Task.Delay(1000);
});
var producer = new ActionBlock<string>(async message =>
{
if (message == "start")
{
int cnt = 0;
while (cnt < 16)
{
int value;
lock (random) value = random.Next(1, 255);
Console.WriteLine($"Producing #{value}");
var accepted = await consumer.SendAsync(value);
if (!accepted) break; // The consumer has failed
await Task.Delay(1500);
cnt++;
}
}
});
PropagateCompletion(producer, consumer);
producer.Post("start");
producer.Complete();
consumer.Completion.Wait();
async void PropagateCompletion(IDataflowBlock source, IDataflowBlock target)
{
try { await source.Completion.ConfigureAwait(false); } catch { }
var ex = source.Completion.IsFaulted ? source.Completion.Exception : null;
if (ex != null) target.Fault(ex); else target.Complete();
}
The main difficulty with this approach is how to propagate the completion of the producer to the consumer, so that eventually both blocks are completed. Obviously you can't use the new DataflowLinkOptions { PropagateCompletion = true } configuration, since the blocks are not linked explicitly. You also can't Complete manually the consumer, because in this case it would stop prematurely accepting values from the producer. The solution to this problem is the PropagateCompletion method shown in the above example.

How do I run a method both parallel and sequentially in C#?

I have a C# console app. In this app, I have a method that I will call DoWorkAsync. For the context of this question, this method looks like this:
private async Task<string> DoWorkAsync()
{
System.Threading.Thread.Sleep(5000);
var random = new Random();
var chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
var length = random.Next(10, 101);
await Task.CompletedTask;
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}
I call DoWorkAsync from another method that determines a) how many times this will get ran and b) if each call will be ran in parallel or sequentially. That method looks like this:
private async Task<Task<string>[]> DoWork(int iterations, bool runInParallel)
{
var tasks = new List<Task<string>>();
for (var i=0; i<iterations; i++)
{
if (runInParallel)
{
var task = Task.Run(() => DoWorkAsync());
tasks.Add(task);
}
else
{
await DoWorkAsync();
}
}
return tasks.ToArray();
}
After all of the tasks are completed, I want to display the results. To do this, I have code that looks like this:
var random = new Random();
var tasks = await DoWork(random.Next(10, 101);
Task.WaitAll(tasks);
foreach (var task in tasks)
{
Console.WriteLine(task.Result);
}
This code works as expected if the code runs in parallel (i.e. runInParallel is true). However, when runInParallel is false (i.e. I want to run the Tasks sequentially) the Task array doesn't get populated. So, the caller doesn't have any results to work with. I don't know how to fix it though. I'm not sure how to add the method call as a Task that will run sequentially. I understand that the idea behind Tasks is to run in parallel. However, I have this need to toggle between parallel and sequential.
Thank you!

the Task array doesn't get populated.
So populate it:
else
{
var task = DoWorkAsync();
tasks.Add(task);
await task;
}
P.S.
Also your DoWorkAsync looks kinda wrong to me, why Thread.Sleep and not await Task.Delay (it is more correct way to simulate asynchronous execution, also you won't need await Task.CompletedTask this way). And if you expect DoWorkAsync to be CPU bound just make it like:
private Task<string> DoWorkAsync()
{
return Task.Run(() =>
{
// your cpu bound work
return "string";
});
}
After that you can do something like this (for both async/cpu bound work):
private async Task<string[]> DoWork(int iterations, bool runInParallel)
{
if(runInParallel)
{
var tasks = Enumerable.Range(0, iterations)
.Select(i => DoWorkAsync());
return await Task.WhenAll(tasks);
}
else
{
var result = new string[iterations];
for (var i = 0; i < iterations; i++)
{
result[i] = await DoWorkAsync();
}
return result;
}
}

Why is DoWorkAsync an async method?
It isn't currently doing anything asynchronous.
It seems that you are trying to utilise multiple threads to improve the performance of expensive CPU-bound work, so you would be better to make use of Parallel.For, which is designed for this purpose:
private string DoWork()
{
System.Threading.Thread.Sleep(5000);
var random = new Random();
var chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
var length = random.Next(10, 101);
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}
private string[] DoWork(int iterations, bool runInParallel)
{
var results = new string[iterations];
if (runInParallel)
{
Parallel.For(0, iterations - 1, i => results[i] = DoWork());
}
else
{
for (int i = 0; i < iterations; i++) results[i] = DoWork();
}
return results;
}
Then:
var random = new Random();
var serial = DoWork(random.Next(10, 101));
var parallel = DoWork(random.Next(10, 101), true);

I think you'd be better off doing the following:
Create a function that creates a (cold) list of tasks (or an array Task<string>[] for instance). No need to run them. Let's call this GetTasks()
var jobs = GetTasks();
Then, if you want to run them "sequentially", just do
var results = new List<string>();
foreach (var job in jobs)
{
var result = await job;
results.Add(result);
}
return results;
If you want to run them in parallel :
foreach (var job in jobs)
{
job.Start();
}
await results = Task.WhenAll(jobs);
Another note,
All this in itself should be a Task<string[]>, the Task<Task<... smells like a problem.

Handle exceptions with TPL Dataflow blocks

I have a simple tpl data flow which basically does some tasks.
I noticed when there is an exception in any of the datablocks, it wasn't getting caught in the initial parent block caller.
I have added some manual code to check for exception but doesn't seem the right approach.
if (readBlock.Completion.Exception != null
|| saveBlockJoinedProcess.Completion.Exception != null
|| processBlock1.Completion.Exception != null
|| processBlock2.Completion.Exception != null)
{
throw readBlock.Completion.Exception;
}
I had a look online to see what's a suggested approach but didn't see anything obvious.
So I created some sample code below and was hoping to get some guidance on a better solution:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;
namespace TPLDataflow
{
class Program
{
static void Main(string[] args)
{
try
{
//ProcessB();
ProcessA();
}
catch (Exception e)
{
Console.WriteLine("Exception in Process!");
throw new Exception($"exception:{e}");
}
Console.WriteLine("Processing complete!");
Console.ReadLine();
}
private static void ProcessB()
{
Task.WhenAll(Task.Run(() => DoSomething(1, "ProcessB"))).Wait();
}
private static void ProcessA()
{
var random = new Random();
var readBlock = new TransformBlock<int, int>(x =>
{
try { return DoSomething(x, "readBlock"); }
catch (Exception e) { throw e; }
}); //1
var braodcastBlock = new BroadcastBlock<int>(i => i); // ⬅ Here
var processBlock1 = new TransformBlock<int, int>(x =>
DoSomethingAsync(5, "processBlock1")); //2
var processBlock2 = new TransformBlock<int, int>(x =>
DoSomethingAsync(2, "processBlock2")); //3
//var saveBlock =
// new ActionBlock<int>(
// x => Save(x)); //4
var saveBlockJoinedProcess =
new ActionBlock<Tuple<int, int>>(
x => SaveJoined(x.Item1, x.Item2)); //4
var saveBlockJoin = new JoinBlock<int, int>();
readBlock.LinkTo(braodcastBlock, new DataflowLinkOptions
{ PropagateCompletion = true });
braodcastBlock.LinkTo(processBlock1,
new DataflowLinkOptions { PropagateCompletion = true }); //5
braodcastBlock.LinkTo(processBlock2,
new DataflowLinkOptions { PropagateCompletion = true }); //6
processBlock1.LinkTo(
saveBlockJoin.Target1); //7
processBlock2.LinkTo(
saveBlockJoin.Target2); //8
saveBlockJoin.LinkTo(saveBlockJoinedProcess,
new DataflowLinkOptions { PropagateCompletion = true });
readBlock.Post(1); //10
//readBlock.Post(2); //10
Task.WhenAll(processBlock1.Completion,processBlock2.Completion)
.ContinueWith(_ => saveBlockJoin.Complete());
readBlock.Complete(); //12
saveBlockJoinedProcess.Completion.Wait(); //13
if (readBlock.Completion.Exception != null
|| saveBlockJoinedProcess.Completion.Exception != null
|| processBlock1.Completion.Exception != null
|| processBlock2.Completion.Exception != null)
{
throw readBlock.Completion.Exception;
}
}
private static int DoSomething(int i, string method)
{
Console.WriteLine($"Do Something, callng method : { method}");
throw new Exception("Fake Exception!");
return i;
}
private static async Task<int> DoSomethingAsync(int i, string method)
{
Console.WriteLine($"Do SomethingAsync");
throw new Exception("Fake Exception!");
await Task.Delay(new TimeSpan(0, 0, i));
Console.WriteLine($"Do Something : {i}, callng method : { method}");
return i;
}
private static void Save(int x)
{
Console.WriteLine("Save!");
}
private static void SaveJoined(int x, int y)
{
Thread.Sleep(new TimeSpan(0, 0, 10));
Console.WriteLine("Save Joined!");
}
}
}

I had a look online to see what's a suggested approach but didn't see anything obvious.
If you have a pipeline (more or less), then the common approach is to use PropagateCompletion to shut down the pipe. If you have more complex topologies, then you would need to complete blocks by hand.
In your case, you have an attempted propagation here:
Task.WhenAll(
processBlock1.Completion,
processBlock2.Completion)
.ContinueWith(_ => saveBlockJoin.Complete());
But this code will not propagate exceptions. When both processBlock1.Completion and processBlock2.Completion complete, saveBlockJoin is completed successfully.
A better solution would be to use await instead of ContinueWith:
async Task PropagateToSaveBlockJoin()
{
try
{
await Task.WhenAll(processBlock1.Completion, processBlock2.Completion);
saveBlockJoin.Complete();
}
catch (Exception ex)
{
((IDataflowBlock)saveBlockJoin).Fault(ex);
}
}
_ = PropagateToSaveBlockJoin();
Using await encourages you to handle exceptions, which you can do by passing them to Fault to propagate the exception.

Propagating errors backward in the pipeline is not supported in the TPL Dataflow out of the box, which is especially annoying when the blocks have a bounded capacity. In this case an error in a block downstream may cause the blocks in front of it to block indefinitely. The only solution I know is to use the cancellation feature, and cancel all blocks in case anyone fails. Here is how it can be done. First create a CancellationTokenSource:
var cts = new CancellationTokenSource();
Then create the blocks one by one, embedding the same CancellationToken in the options of all of them:
var options = new ExecutionDataflowBlockOptions()
{ BoundedCapacity = 10, CancellationToken = cts.Token };
var block1 = new TransformBlock<double, double>(Math.Sqrt, options);
var block2 = new ActionBlock<double>(Console.WriteLine, options);
Then link the blocks together, including the PropagateCompletion setting:
block1.LinkTo(block2, new DataflowLinkOptions { PropagateCompletion = true });
Finally use an extension method to trigger the cancellation of the CancellationTokenSource in case of an exception:
block1.OnFaultedCancel(cts);
block2.OnFaultedCancel(cts);
The OnFaultedCancel extension method is shown below:
public static class DataflowExtensions
{
public static void OnFaultedCancel(this IDataflowBlock dataflowBlock,
CancellationTokenSource cts)
{
dataflowBlock.Completion.ContinueWith(_ => cts.Cancel(), default,
TaskContinuationOptions.OnlyOnFaulted |
TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Default);
}
}

at the first look, if have only some minor points (not looking at your architecture). it seems to me that you have mixed some newer and some older constructs. and there are some code parts which are unnecessary.
for example:
private static void ProcessB()
{
Task.WhenAll(Task.Run(() => DoSomething(1, "ProcessB"))).Wait();
}
using the Wait()-method, if any exceptions happen, they will be wrapped in a System.AggregateException. in my opinion, this is better:
private static async Task ProcessBAsync()
{
await Task.Run(() => DoSomething(1, "ProcessB"));
}
using async-await, if an exception occurs, the await statement rethrows the first exception which is wrapped in the System.AggregateException. This allows you to try-catch for concrete exception types and handle only cases you really can handle.
another thing is this part of your code:
private static void ProcessA()
{
var random = new Random();
var readBlock = new TransformBlock<int, int>(
x =>
{
try { return DoSomething(x, "readBlock"); }
catch (Exception e)
{
throw e;
}
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 1 }); //1
Why catch an exception only to rethrow it? in this case, the try-catch is redundant.
And this here:
private static void SaveJoined(int x, int y)
{
Thread.Sleep(new TimeSpan(0, 0, 10));
Console.WriteLine("Save Joined!");
}
It is much better to use await Task.Delay(....). Using Task.Delay(...), your application will not freeze.

TaskFactory, Starting a new Task when one ends

I have found many methods of using the TaskFactory but I could not find anything about starting more tasks and watching when one ends and starting another one.
I always want to have 10 tasks working.
I want something like this
int nTotalTasks=10;
int nCurrentTask=0;
Task<bool>[] tasks=new Task<bool>[nThreadsNum];
for (int i=0; i<1000; i++)
{
string param1="test";
string param2="test";
if (nCurrentTask<10) // if there are less than 10 tasks then start another one
tasks[nCurrentThread++] = Task.Factory.StartNew<bool>(() =>
{
MyClass cls = new MyClass();
bool bRet = cls.Method1(param1, param2, i); // takes up to 2 minutes to finish
return bRet;
});
// How can I stop the for loop until a new task is finished and start a new one?
}

Check out the Task.WaitAny method:
Waits for any of the provided Task objects to complete execution.
Example from the documentation:
var t1 = Task.Factory.StartNew(() => DoOperation1());
var t2 = Task.Factory.StartNew(() => DoOperation2());
Task.WaitAny(t1, t2)

I would use a combination of Microsoft's Reactive Framework (NuGet "Rx-Main") and TPL for this. It becomes very simple.
Here's the code:
int nTotalTasks=10;
string param1="test";
string param2="test";
IDisposable subscription =
Observable
.Range(0, 1000)
.Select(i => Observable.FromAsync(() => Task.Factory.StartNew<bool>(() =>
{
MyClass cls = new MyClass();
bool bRet = cls.Method1(param1, param2, i); // takes up to 2 minutes to finish
return bRet;
})))
.Merge(nTotalTasks)
.ToArray()
.Subscribe((bool[] results) =>
{
/* Do something with the results. */
});
The key part here is the .Merge(nTotalTasks) which limits the number of concurrent tasks.
If you need to stop the processing part way thru just call subscription.Dispose() and everything gets cleaned up for you.
If you want to process each result as they are produced you can change the code from the .Merge(...) like this:
.Merge(nTotalTasks)
.Subscribe((bool result) =>
{
/* Do something with each result. */
});

This should be all you need, not complete, but all you need to do is wait on the first to complete and then run the second.
Task.WaitAny(task to wait on);
Task.Factory.StartNew()

Have you seen the BlockingCollection class? It allows you to have multiple threads running in parallel and you can wait from results from one task to execute another. See more information here.

The answer depends on whether the tasks to be scheduled are CPU or I/O bound.
For CPU-intensive work I would use Parallel.For() API setting the number of thread/tasks through MaxDegreeOfParallelism property of ParallelOptions
For I/O bound work the number of concurrently executing tasks can be significantly larger than the number of available CPUs, so the strategy is to rely on async methods as much as possible, which reduces the total number of threads waiting for completion.
How can I stop the for loop until a new task is finished and start a
new one?
The loop can be throttled by using await:
static void Main(string[] args)
{
var task = DoWorkAsync();
task.Wait();
// handle results
// task.Result;
Console.WriteLine("Done.");
}
async static Task<bool> DoWorkAsync()
{
const int NUMBER_OF_SLOTS = 10;
string param1="test";
string param2="test";
var results = new bool[NUMBER_OF_SLOTS];
AsyncWorkScheduler ws = new AsyncWorkScheduler(NUMBER_OF_SLOTS);
for (int i = 0; i < 1000; ++i)
{
await ws.ScheduleAsync((slotNumber) => DoWorkAsync(i, slotNumber, param1, param2, results));
}
ws.Complete();
await ws.Completion;
}
async static Task DoWorkAsync(int index, int slotNumber, string param1, string param2, bool[] results)
{
results[slotNumber] = results[slotNumber} && await Task.Factory.StartNew<bool>(() =>
{
MyClass cls = new MyClass();
bool bRet = cls.Method1(param1, param2, i); // takes up to 2 minutes to finish
return bRet;
}));
}
A helper class AsyncWorkScheduler uses TPL.DataFlow components as well as Task.WhenAll():
class AsyncWorkScheduler
{
public AsyncWorkScheduler(int numberOfSlots)
{
m_slots = new Task[numberOfSlots];
m_availableSlots = new BufferBlock<int>();
m_errors = new List<Exception>();
m_tcs = new TaskCompletionSource<bool>();
m_completionPending = 0;
// Initial state: all slots are available
for(int i = 0; i < m_slots.Length; ++i)
{
m_slots[i] = Task.FromResult(false);
m_availableSlots.Post(i);
}
}
public async Task ScheduleAsync(Func<int, Task> action)
{
if (Volatile.Read(ref m_completionPending) != 0)
{
throw new InvalidOperationException("Unable to schedule new items.");
}
// Acquire a slot
int slotNumber = await m_availableSlots.ReceiveAsync().ConfigureAwait(false);
// Schedule a new task for a given slot
var task = action(slotNumber);
// Store a continuation on the task to handle completion events
m_slots[slotNumber] = task.ContinueWith(t => HandleCompletedTask(t, slotNumber), TaskContinuationOptions.ExecuteSynchronously);
}
public async void Complete()
{
if (Interlocked.CompareExchange(ref m_completionPending, 1, 0) != 0)
{
return;
}
// Signal the queue's completion
m_availableSlots.Complete();
await Task.WhenAll(m_slots).ConfigureAwait(false);
// Set completion
if (m_errors.Count != 0)
{
m_tcs.TrySetException(m_errors);
}
else
{
m_tcs.TrySetResult(true);
}
}
public Task Completion
{
get
{
return m_tcs.Task;
}
}
void SetFailed(Exception error)
{
lock(m_errors)
{
m_errors.Add(error);
}
}
void HandleCompletedTask(Task task, int slotNumber)
{
if (task.IsFaulted || task.IsCanceled)
{
SetFailed(task.Exception);
return;
}
if (Volatile.Read(ref m_completionPending) == 1)
{
return;
}
// Release a slot
m_availableSlots.Post(slotNumber);
}
int m_completionPending;
List<Exception> m_errors;
BufferBlock<int> m_availableSlots;
TaskCompletionSource<bool> m_tcs;
Task[] m_slots;
}

TransformBlock never completes

I'm trying to wrap my head around "completion" in TPL Dataflow blocks. In particular, the TransformBlock doesn't seem to ever complete. Why?
Sample program
My code calculates the square of all integers from 1 to 1000. I used a BufferBlock and a TransformBlock for that. Later in my code, I await completion of the TransformBlock. The block never actually completes though, and I don't understand why.
static void Main(string[] args)
{
var bufferBlock = new BufferBlock<int>();
var calculatorBlock = new TransformBlock<int, int>(i =>
{
Console.WriteLine("Calculating {0}²", i);
return (int)Math.Pow(i, 2);
}, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 8 });
using (bufferBlock.LinkTo(calculatorBlock, new DataflowLinkOptions { PropagateCompletion = true }))
{
foreach (var number in Enumerable.Range(1, 1000))
{
bufferBlock.Post(number);
}
bufferBlock.Complete();
// This line never completes
calculatorBlock.Completion.Wait();
// Unreachable code
IList<int> results;
if (calculatorBlock.TryReceiveAll(out results))
{
foreach (var result in results)
{
Console.WriteLine("x² = {0}", result);
}
}
}
}
At first I thought I created a deadlock situation, but that doesn't seem to be true. When I inspected the calculatorBlock.Completion task in the debugger, its Status property was set to WaitingForActivation. That was the moment when my brain blue screened.

The reason your pipeline hangs is that both BufferBlock and TransformBlock evidently don't complete until they emptied themselves of items (I guess that the desired behavior of IPropagatorBlocks although I haven't found documentation on it).
This can be verified with a more minimal example:
var bufferBlock = new BufferBlock<int>();
bufferBlock.Post(0);
bufferBlock.Complete();
bufferBlock.Completion.Wait();
This blocks indefinitely unless you add bufferBlock.Receive(); before completing.
If you remove the items from your pipeline before blocking by either your TryReceiveAll code block, connecting another ActionBlock to the pipeline, converting your TransformBlock to an ActionBlock or any other way this will no longer block.
About your specific solution, it seems that you don't need a BufferBlock or TransformBlock at all since blocks have an input queue for themselves and you don't use the return value of the TransformBlock. This could be achieved with just an ActionBlock:
var block = new ActionBlock<int>(
i =>
{
Console.WriteLine("Calculating {0}²", i);
Console.WriteLine("x² = {0}", (int)Math.Pow(i, 2));
},
new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = 8});
foreach (var number in Enumerable.Range(1, 1000))
{
block.Post(number);
}
block.Complete();
block.Completion.Wait();

I think I understand it now. An instance of TransformBlock is not considered "complete" until the following conditions are met:
TransformBlock.Complete() has been called
InputCount == 0 – the block has applied its transformation to every incoming element
OutputCount == 0 – all transformed elements have left the output buffer
In my program, there is no target block that is linked to the source TransformBlock, so the source block never gets to flush its output buffer.
As a workaround, I added a second BufferBlock that is used to store transformed elements.
static void Main(string[] args)
{
var inputBufferBlock = new BufferBlock<int>();
var calculatorBlock = new TransformBlock<int, int>(i =>
{
Console.WriteLine("Calculating {0}²", i);
return (int)Math.Pow(i, 2);
}, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 8 });
var outputBufferBlock = new BufferBlock<int>();
using (inputBufferBlock.LinkTo(calculatorBlock, new DataflowLinkOptions { PropagateCompletion = true }))
using (calculatorBlock.LinkTo(outputBufferBlock, new DataflowLinkOptions { PropagateCompletion = true }))
{
foreach (var number in Enumerable.Range(1, 1000))
{
inputBufferBlock.Post(number);
}
inputBufferBlock.Complete();
calculatorBlock.Completion.Wait();
IList<int> results;
if (outputBufferBlock.TryReceiveAll(out results))
{
foreach (var result in results)
{
Console.WriteLine("x² = {0}", result);
}
}
}
}

TransformBlock needs a ITargetBlock where he can post the transformation.
var writeCustomerBlock = new ActionBlock<int>(c => Console.WriteLine(c));
transformBlock.LinkTo(
writeCustomerBlock, new DataflowLinkOptions
{
PropagateCompletion = true
});
After this it completes.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Why TPL Dataflow block.LinkTo does not give any output? - c#

Related

Asynchronous Task, video buffering

How do I run a method both parallel and sequentially in C#?

Handle exceptions with TPL Dataflow blocks

TaskFactory, Starting a new Task when one ends

TransformBlock never completes

Categories

Resources