Async/Await or Task.Run in Console Application/Windows Service - c#

I have been researching (including looking at all other SO posts on this topic) the best way to implement a (most likely) Windows Service worker that will pull items of work from a database and process them in parallel asynchronously in a 'fire-and-forget' manner in the background (the work item management will all be handled in the asynchronous method). The work items will be web service calls and database queries. There will be some throttling applied to the producer of these work items to ensure some kind of measured approach to scheduling the work. The examples below are very basic and are just there to highlight the logic of the while loop and for loop in place. Which is the ideal method or does it not matter? Is there a more appropriate/performant way of achieving this?
async/await...
private static int counter = 1;
static void Main(string[] args)
{
Console.Title = "Async";
Task.Run(() => AsyncMain());
Console.ReadLine();
}
private static async void AsyncMain()
{
while (true)
{
// Imagine calling a database to get some work items to do, in this case 5 dummy items
for (int i = 0; i < 5; i++)
{
var x = DoSomethingAsync(counter.ToString());
counter++;
Thread.Sleep(50);
}
Thread.Sleep(1000);
}
}
private static async Task<string> DoSomethingAsync(string jobNumber)
{
try
{
// Simulated mostly IO work - some could be long running
await Task.Delay(5000);
Console.WriteLine(jobNumber);
}
catch (Exception ex)
{
LogException(ex);
}
Log("job {0} has completed", jobNumber);
return "fire and forget so not really interested";
}
Task.Run...
private static int counter = 1;
static void Main(string[] args)
{
Console.Title = "Task";
while (true)
{
// Imagine calling a database to get some work items to do, in this case 5 dummy items
for (int i = 0; i < 5; i++)
{
var x = Task.Run(() => { DoSomethingAsync(counter.ToString()); });
counter++;
Thread.Sleep(50);
}
Thread.Sleep(1000);
}
}
private static string DoSomethingAsync(string jobNumber)
{
try
{
// Simulated mostly IO work - some could be long running
Task.Delay(5000);
Console.WriteLine(jobNumber);
}
catch (Exception ex)
{
LogException(ex);
}
Log("job {0} has completed", jobNumber);
return "fire and forget so not really interested";
}

pull items of work from a database and process them in parallel asynchronously in a 'fire-and-forget' manner in the background
Technically, you want concurrency. Whether you want asynchronous concurrency or parallel concurrency remains to be seen...
The work items will be web service calls and database queries.
The work is I/O-bound, so that implies asynchronous concurrency as the more natural approach.
There will be some throttling applied to the producer of these work items to ensure some kind of measured approach to scheduling the work.
The idea of a producer/consumer queue is implied here. That's one option. TPL Dataflow provides some nice producer/consumer queues that are async-compatible and support throttling.
Alternatively, you can do the throttling yourself. For asynchronous code, there's a built-in throttling mechanism called SemaphoreSlim.
TPL Dataflow approach, with throttling:
private static int counter = 1;
static void Main(string[] args)
{
Console.Title = "Async";
var x = Task.Run(() => MainAsync());
Console.ReadLine();
}
private static async Task MainAsync()
{
var blockOptions = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 7
};
var block = new ActionBlock<string>(DoSomethingAsync, blockOptions);
while (true)
{
var dbData = await ...; // Imagine calling a database to get some work items to do, in this case 5 dummy items
for (int i = 0; i < 5; i++)
{
block.Post(counter.ToString());
counter++;
Thread.Sleep(50);
}
Thread.Sleep(1000);
}
}
private static async Task DoSomethingAsync(string jobNumber)
{
try
{
// Simulated mostly IO work - some could be long running
await Task.Delay(5000);
Console.WriteLine(jobNumber);
}
catch (Exception ex)
{
LogException(ex);
}
Log("job {0} has completed", jobNumber);
}
Asynchronous concurrency approach with manual throttling:
private static int counter = 1;
private static SemaphoreSlim semaphore = new SemaphoreSlim(7);
static void Main(string[] args)
{
Console.Title = "Async";
var x = Task.Run(() => MainAsync());
Console.ReadLine();
}
private static async Task MainAsync()
{
while (true)
{
var dbData = await ...; // Imagine calling a database to get some work items to do, in this case 5 dummy items
for (int i = 0; i < 5; i++)
{
var x = DoSomethingAsync(counter.ToString());
counter++;
Thread.Sleep(50);
}
Thread.Sleep(1000);
}
}
private static async Task DoSomethingAsync(string jobNumber)
{
await semaphore.WaitAsync();
try
{
try
{
// Simulated mostly IO work - some could be long running
await Task.Delay(5000);
Console.WriteLine(jobNumber);
}
catch (Exception ex)
{
LogException(ex);
}
Log("job {0} has completed", jobNumber);
}
finally
{
semaphore.Release();
}
}
As a final note, I hardly ever recommend my own book on SO, but I do think it would really benefit you. In particular, sections 8.10 (Blocking/Asynchronous Queues), 11.5 (Throttling), and 4.4 (Throttling Dataflow Blocks).

First of all, let's fix some.
In the second example you are calling
Task.Delay(5000);
without await. It is a bad idea. It creates a new Task instance which runs for 5 seconds but no one is waiting for it. Task.Delay is only useful with await. Mind you, do not use Task.Delay(5000).Wait() or you are going to get deadlocked.
In your second example you are trying to make the DoSomethingAsync method synchronous, lets call it DoSomethingSync and replace the Task.Delay(5000); with Thread.Sleep(5000);
Now, the second example is almost the old-school ThreadPool.QueueUserWorkItem. And there is nothing bad with it in case you are not using some already-async API inside. Task.Run and ThreadPool.QueueUserWorkItem used in the fire-and-forget case are just the same thing. I would use the latter for clarity.
This slowly drives us to the answer to the main question. Async or not async - this is the question! I would say: "Do not create async methods in case you do not have to use some async IO inside your code". If however there is async API you have to use than the first approach would be more expected by those who are going to read your code years later.

Related

How should I use Task.Run in my code for proper scalability and performance?

I started to have HUGE doubts regarding my code and I need some advice from more experienced programmers.
In my application on the button click, the application runs a command, that is calling a ScrapJockeys method:
if (UpdateJockeysPl) await ScrapJockeys(JPlFrom, JPlTo + 1, "jockeysPl"); //1 - 1049
ScrapJockeys is triggering a for loop, repeating code block between 20K - 150K times (depends on the case). Inside the loop, I need to call a service method, where the execution of the method takes a lot of time. Also, I wanted to have the ability of cancellation of the loop and everything that is going on inside of the loop/method.
Right now I am with a method with a list of tasks, and inside of the loop is triggered a Task.Run. Inside of each task, I am calling an awaited service method, which reduces execution time of everything to 1/4 comparing to synchronous code. Also, each task has assigned a cancellation token, like in the example GitHub link:
public async Task ScrapJockeys(int startIndex, int stopIndex, string dataType)
{
//init values and controls in here
List<Task> tasks = new List<Task>();
for (int i = startIndex; i < stopIndex; i++)
{
int j = i;
Task task = Task.Run(async () =>
{
LoadedJockey jockey = new LoadedJockey();
CancellationToken.ThrowIfCancellationRequested();
if (dataType == "jockeysPl") jockey = await _scrapServices.ScrapSingleJockeyPlAsync(j);
if (dataType == "jockeysCz") jockey = await _scrapServices.ScrapSingleJockeyCzAsync(j);
//doing some stuff with results in here
}, TokenSource.Token);
tasks.Add(task);
}
try
{
await Task.WhenAll(tasks);
}
catch (OperationCanceledException)
{
//
}
finally
{
await _dataServices.SaveAllJockeysAsync(Jockeys.ToList()); //saves everything to JSON file
//soing some stuff with UI props in here
}
}
So about my question, is there everything fine with my code? According to this article:
Many async newbies start off by trying to treat asynchronous tasks the
same as parallel (TPL) tasks and this is a major misstep.
What should I use then?
And according to this article:
On a busy server, this kind of implementation can kill scalability.
So how am I supposed to do it?
Please be noted, that the service interface method signature is Task<LoadedJockey> ScrapSingleJockeyPlAsync(int index);
And also I am not 100% sure that I am using Task.Run correctly within my service class. The methods inside are wrapping the code inside await Task.Run(() =>, like in the example GitHub link:
public async Task<LoadedJockey> ScrapSingleJockeyPlAsync(int index)
{
LoadedJockey jockey = new LoadedJockey();
await Task.Run(() =>
{
//do some time consuming things
});
return jockey;
}
As far as I understand from the articles, this is a kind of anti-pattern. But I am confused a bit. Based on this SO reply, it should be fine...? If not, how to replace it?
On the UI side, you should be using Task.Run when you have CPU-bound code that is long enough that you need to move it off the UI thread. This is completely different than the server side, where using Task.Run at all is an anti-pattern.
In your case, all your code seems to be I/O-based, so I don't see a need for Task.Run at all.
There is a statement in your question that conflicts with the provided code:
I am calling an awaited service method
public async Task<LoadedJockey> ScrapSingleJockeyPlAsync(int index)
{
await Task.Run(() =>
{
//do some time consuming things
});
}
The lambda passed to Task.Run is not async, so the service method cannot possibly be awaited. And indeed it is not.
A better solution would be to load the HTML asynchronously (e.g., using HttpClient.GetStringAsync), and then call HtmlDocument.LoadHtml, something like this:
public async Task<LoadedJockey> ScrapSingleJockeyPlAsync(int index)
{
LoadedJockey jockey = new LoadedJockey();
...
string link = sb.ToString();
var html = await httpClient.GetStringAsync(link).ConfigureAwait(false);
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
if (jockey.Name == null)
...
return jockey;
}
And also remove the Task.Run from your for loop:
private async Task ScrapJockey(string dataType)
{
LoadedJockey jockey = new LoadedJockey();
CancellationToken.ThrowIfCancellationRequested();
if (dataType == "jockeysPl") jockey = await _scrapServices.ScrapSingleJockeyPlAsync(j).ConfigureAwait(false);
if (dataType == "jockeysCz") jockey = await _scrapServices.ScrapSingleJockeyCzAsync(j).ConfigureAwait(false);
//doing some stuff with results in here
}
public async Task ScrapJockeys(int startIndex, int stopIndex, string dataType)
{
//init values and controls in here
List<Task> tasks = new List<Task>();
for (int i = startIndex; i < stopIndex; i++)
{
tasks.Add(ScrapJockey(dataType));
}
try
{
await Task.WhenAll(tasks);
}
catch (OperationCanceledException)
{
//
}
finally
{
await _dataServices.SaveAllJockeysAsync(Jockeys.ToList()); //saves everything to JSON file
//soing some stuff with UI props in here
}
}
As far as I understand from the articles, this is a kind of anti-pattern.
It is an anti-pattern. But if can't modify the service implementation, you should at least be able to execute the tasks in parallel. Something like this:
public async Task ScrapJockeys(int startIndex, int stopIndex, string dataType)
{
ConcurrentBag<Task> tasks = new ConcurrentBag<Task>();
ParallelOptions parallelLoopOptions = new ParallelOptions() { CancellationToken = CancellationToken };
Parallel.For(startIndex, stopIndex, parallelLoopOptions, i =>
{
int j = i;
switch (dataType)
{
case "jockeysPl":
tasks.Add(_scrapServices.ScrapSingleJockeyPlAsync(j));
break;
case "jockeysCz":
tasks.Add(_scrapServices.ScrapSingleJockeyCzAsync(j));
break;
}
});
try
{
await Task.WhenAll(tasks);
}
catch (OperationCanceledException)
{
//
}
finally
{
await _dataServices.SaveAllJockeysAsync(Jockeys.ToList()); //saves everything to JSON file
//soing some stuff with UI props in here
}
}

Task.ContinueWith seems to be called more often than there are actual tasks

First, this is from something much bigger and yes, I could completely avoid this using await under normal/other circumstances. For anyone interested, I'll explain below.
To track how many tasks I still have left before my program continues, I've built the following:
A counter:
private static int counter;
Some method:
public static void Test()
{
List<Task> tasks = new List<Task>();
for (int i = 0; i < 10000; i++)
{
TaskCompletionSource<object> tcs = new TaskCompletionSource<object>();
var task = DoTaskWork();
task.ContinueWith(t => // After DoTaskWork
{
// [...] Use t's Result
counter--; // Decrease counter
tcs.SetResult(null); // Finish the task that the UI or whatever is waiting for
});
tasks.Add(tcs.Task); // Store tasks to wait for
}
Task.WaitAll(tasks.ToArray()); // Wait for all tasks that actually only finish in the ContinueWith
Console.WriteLine(counter);
}
My super heavy work to do:
private static Task DoTaskWork()
{
counter++; // Increase counter
return Task.Delay(500);
}
Now, interestingly I do not receive the number 0 at the end when looking at counter. Instead, the number varies with each execution. Why is this? I tried various tests, but cannot find the reason for the irregularity. With the TaskCompletionSource I believed this to be reliable. Thanks.
Now, for anyone that is interested in why I do this:
I need to create loads of tasks without starting them. For this I need to use the Task constructor (one of its rare use cases). Its disadvantage to Task.Run() is that it cannot handle anything with await and that it needs a return type from the Task to properly run (hence the null as result). Therefore, I need a way around that. Other ideas welcome...
Well. I am stupid. Just 5 minutes in, I realize that.
I just did the same while locking a helper object before changing the counter in any way and now it works...
private static int counter;
private static object locker = new object();
// [...]
task.ContinueWith(t =>
{
lock(locker)
counter--;
tcs.SetResult(null);
});
// [...]
private static Task DoTaskWork()
{
lock (locker)
counter++;
return Task.Delay(500);
}
I need to create loads of tasks without starting them ... Therefore, I need a way around that. Other ideas welcome...
So, if I read it correct you want to build a list of tasks without actually run them on creation. You could do that by building a list of Func<Task> objects you invoke when required:
async Task Main()
{
// Create list of work to do later
var tasks = new List<Func<Task>>();
// Schedule some work
tasks.Add(() => DoTaskWork(1));
tasks.Add(() => DoTaskWork(2));
// Wait for user input before doing work to demonstrate they are not started right away
Console.ReadLine();
// Execute and wait for the completion of the work to be done
await Task.WhenAll(tasks.Select(t => t.Invoke()));
Console.WriteLine("Ready");
}
public async Task DoTaskWork(int taskNr)
{
await Task.Delay(100);
Console.WriteLine(taskNr);
}
This will work, even if you use Task.Run like this:
public Task DoTaskWork(int taskNr)
{
return Task.Run(() =>
{
Thread.Sleep(100); Console.WriteLine(taskNr);
});
}
It this is not want you want can you elaborate more about the tasks you want to create?

Limit total concurrent tasks running [duplicate]

This question already has answers here:
How to limit the amount of concurrent async I/O operations?
(11 answers)
Closed 5 years ago.
I have a method Create which is executed whenever a new message is seen on the service bus message queue (https://azure.microsoft.com/en-us/services/service-bus/).
I am trying to limit the total number of concurrent tasks that can run in parallel for all calls of Create to 5 tasks.
In my code Parallel.ForEach does not seem to do anything.
I have tried to add a mutex/lock around the makePdfAsync() invocation like this:
mutex.WaitOne();
if(curretNumTasks < MaxTasks)
{
tasks.Add(makePdfAsync(form));
}
mutex.ReleaseMutex();
but it is extremely slow and makes the service bus throw.
How do I limit the number of concurrent tasks all invocations of Create creates?
public async Task Create(List<FormModel> forms)
{
var tasks = new List<Task>();
Parallel.ForEach(forms, new ParallelOptions { MaxDegreeOfParallelism = 5 }, form =>
{
tasks.Add(makePdfAsync(form));
});
await Task.WhenAny(Task.WhenAll(tasks), Task.Delay(TimeSpan.FromMinutes(10)));
}
public async Task makePdfAsync()
{
var message = new PdfMessageModel();
message.forms = new List<FormModel>() { form };
var retry = 10;
var uri = new Uri("http://localhost.:8007");
var json = JsonConvert.SerializeObject(message);
using (var wc = new WebClient())
{
wc.Encoding = System.Text.Encoding.UTF8;
// reconnect with delay in case process is not ready
while (true)
{
try
{
await wc.UploadStringTaskAsync(uri, json);
break;
}
catch
{
if (retry-- == 0) throw;
}
}
}
}
TL;DR. Create is a method on a class, it is called on many instances simultaneously. The concurrency is two fold; Several invocations of Create simultaneously and within each invocation of Create several tasks run concurrently.
How do I limit the total number of tasks running at any one point?
You could look at using a system wide semaphore?
for example :
var throttle = new Semaphore(5,5,"pdftaskthrottle");
if (throttle.WaitOne(5000)){
try{
//do some task / thread stuff
.....
} catch(Exception ex){
// handle
} finally {
//always remember to release the semaphore
throttle.Release();
}
} else {
//we timed out ... try again?
}
If I understand you correctly, you effectively want a producer/consumer queue with a limit of 5 tasks. BlockingCollection would be the best if that's what you're after. It has very good performance as internally it uses SemaphoreSlim to do the blocking when necessary. Also you can leverage Task together e.g. creating a BlockingCollection<Task<T>>. "C# in a nutshell" has a good section of this; see code below as a general example. Also try avoid using kernel-mode synchronisation construct like mutex if possible as they're slow (you have to pay for transiting from managed code into native code!).
class PCQueue : IDisposable
{
private BlockingCollection<Task> _taskQueue = new BlockingCollection<Task>();
public PCQueue(int workerCount)
{
for (int i = 0; i < workerCount; i++)
Task.Factory.StartNew(Consume);
}
public Task Enqueue(Action action, CancellationToken cancelToken = default(CancellationToken))
{
//! A task object can either be generated using TaskCompletionSource or instantiated directly (an unstarted or cold task!).
var task = new Task(action, cancelToken);
_taskQueue.Add(task); //? Create a cold task and enqueue it.
return task;
}
public Task<TResult> Enqueue<TResult>(Func<TResult> func, CancellationToken cancelToken = default(CancellationToken))
{
var task = new Task<TResult>(func, cancelToken);
_taskQueue.Add(task);
return task;
}
void Consume()
{
foreach (var task in _taskQueue.GetConsumingEnumerable())
{
try
{
//! We run the task synchronously on the consumer's thread.
if (!task.IsCanceled) task.RunSynchronously();
}
catch (InvalidOperationException)
{
//! Handle the unlikely event that the task is canceled in between checking whether it's canceled and running it.
// race condition!
}
}
}
public void Dispose() => _taskQueue.CompleteAdding();
}

How can I guarantee continuations run in task completion order?

If I have code that abstracts a staged sequence of asynchronous operations by returning a Task representing each stage, how can I ensure that continuations execute in stage order (i.e. the order in which the Tasks are completed)?
Note that this is a different requirement from simply 'not wasting time waiting for slower tasks'. The order needs to be guaranteed without race conditions in the scheduling. This looser requirement could addressed by parts of the answers to the following questions:
Sort Tasks into order of completition
Is there default way to get first task that finished successfully?
I think the logical solution would be to attach the continuations using a custom TaskScheduler (such as one based on a SynchronizationContext). However, I can't find any assurance that the scheduling of continuations is performed synchronously upon task completion.
In code this could be something like
class StagedOperationSource
{
public TaskCompletionSource Connect = new TaskCompletionSource();
public TaskCompletionSource Accept = new TaskCompletionSource();
public TaskCompletionSource Complete = new TaskCompletionSource();
}
class StagedOperation
{
public Task Connect, Accept, Complete;
public StagedOperation(StagedOperationSource source)
{
Connect = source.Connect.Task;
Accept = source.Accept.Task;
Complete = source.Complete.Task;
}
}
...
private StagedOperation InitiateStagedOperation(int opId)
{
var source = new StagedOperationSource();
Task.Run(GetRunnerFromOpId(opId, source));
return new StagedOperation(source);
}
...
public RunOperations()
{
for (int i=0; i<3; i++)
{
var op = InitiateStagedOperation(i);
op.Connect.ContinueWith(t => Console.WriteLine("{0}: Connected", i));
op.Accept.ContinueWith(t => Console.WriteLine("{0}: Accepted", i));
op.Complete.ContinueWith(t => Console.WriteLine("{0}: Completed", i));
}
}
which should produce output similar to
0: Connected
1: Connected
0: Accepted
2: Connected
0: Completed
1: Accepted
2: Accepted
2: Completed
1: Completed
Obviously the example is missing details like forwarding exceptions to (or cancelling) later stages if an earlier stage fails, but its just an example.
Just await each stage before going onto the next...
public static async Task ProcessStagedOperation(StagedOperation operation, int i)
{
await operation.Connect;
Console.WriteLine("{0}: Connected", i);
await operation.Accept;
Console.WriteLine("{0}: Accepted", i);
await operation.Complete;
Console.WriteLine("{0}: Completed", i);
}
You can then call that method in your for loop.
If you use TAP (Task Asynchronous Programming), i.e. async and await, you can make the flow of processing a lot more apparent. In this case I would create a new method to encapsulate the order of operations:
public async Task ProcessStagedOperation(StagedOperation op, int i)
{
await op.Connect;
Console.WriteLine("{0}: Connected", i);
await op.Accept;
Console.WriteLine("{0}: Accepted", i)
await op.Complete;
Console.WriteLine("{0}: Completed", i)
}
Now your processing loop gets simplified a bit:
public async Task RunOperations()
{
List<Task> pendingOperations = new List<Task>();
for (int i=0; i<3; i++)
{
var op = InitiateStagedOperation(i);
pendingOperations.Add(ProcessStagedOperation(op, i));
}
await Task.WhenAll(pendingOperations); // finish
}
You now have a reference to a task object you can explicitly wait or simply await from another context. (or you can simply ignore it). The way I modified the RunOperations() method allows you to create a large queue of pending tasks but not block while you wait for them all to finish.

TPL DataFlow Workflow

I have just started reading TPL Dataflow and it is really confusing for me. There are so many articles on this topic which I read but I am unable to digest it easily. May be it is difficult and may be I haven't started to grasp the idea.
The reason why I started looking into this is that I wanted to implement a scenario where parallel tasks could be run but in order and found that TPL Dataflow can be used as this.
I am practicing TPL and TPL Dataflow both and am at very beginners level so I need help from experts who could guide me to the right direction. In the test method written by me I have done the following thing,
private void btnTPLDataFlow_Click(object sender, EventArgs e)
{
Stopwatch watch = new Stopwatch();
watch.Start();
txtOutput.Clear();
ExecutionDataflowBlockOptions execOptions = new ExecutionDataflowBlockOptions();
execOptions.MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded;
ActionBlock<string> actionBlock = new ActionBlock<string>(async v =>
{
await Task.Delay(200);
await Task.Factory.StartNew(
() => txtOutput.Text += v + Environment.NewLine,
CancellationToken.None,
TaskCreationOptions.None,
scheduler
);
}, execOptions);
for (int i = 1; i < 101; i++)
{
actionBlock.Post(i.ToString());
}
actionBlock.Complete();
watch.Stop();
lblTPLDataFlow.Text = Convert.ToString(watch.ElapsedMilliseconds / 1000);
}
Now the procedure is parallel and both asynchronous (not freezing my UI) but the output generated is not in order whereas I have read that TPL Dataflow keeps the order of the elements by default. So my guess is that, then the Task which I have created is the culprit and it is not output the string in correct order. Am I right?
If this is the case then how do I make this Asynchronous and in order both?
I have tried to separate the code and tried to distribute the code in to different methods but my this try is failed as only string is output to textbox and nothing else happened.
private async void btnTPLDataFlow_Click(object sender, EventArgs e)
{
Stopwatch watch = new Stopwatch();
watch.Start();
await TPLDataFlowOperation();
watch.Stop();
lblTPLDataFlow.Text = Convert.ToString(watch.ElapsedMilliseconds / 1000);
}
public async Task TPLDataFlowOperation()
{
var actionBlock = new ActionBlock<int>(async values => txtOutput.Text += await ProcessValues(values) + Environment.NewLine,
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded, TaskScheduler = scheduler });
for (int i = 1; i < 101; i++)
{
actionBlock.Post(i);
}
actionBlock.Complete();
await actionBlock.Completion;
}
private async Task<string> ProcessValues(int i)
{
await Task.Delay(200);
return "Test " + i;
}
I know I have written a bad piece of code but this is the first time I am experimenting with TPL Dataflow.
How do I make this Asynchronous and in order?
This is something of a contradiction. You can make concurrent tasks start in order, but you can't really guarantee that they will run or complete in order.
Let's examine your code and see what's happening.
First, you've selected DataflowBlockOptions.Unbounded. This tells TPL Dataflow that it shouldn't limit the number of tasks that it allows to run concurrently. Therefore, each of your tasks will start at more-or-less the same time, in order.
Your asynchronous operation begins with await Task.Delay(200). This will cause your method to be suspended and then resume after about 200 ms. However, this delay is not exact, and will vary from one invocation to the next. Also, the mechanism by which your code is resumed after the delay may presumably take a variable amount of time. Because of this random variation in the actual delay, then next bit of code to run is now not in order—resulting in the discrepancy you're seeing.
You might find this example interesting. It's a console application to simplify things a bit.
class Program
{
static void Main(string[] args)
{
OutputNumbersWithDataflow();
OutputNumbersWithParallelLinq();
Console.ReadLine();
}
private static async Task HandleStringAsync(string s)
{
await Task.Delay(200);
Console.WriteLine("Handled {0}.", s);
}
private static void OutputNumbersWithDataflow()
{
var block = new ActionBlock<string>(
HandleStringAsync,
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });
for (int i = 0; i < 20; i++)
{
block.Post(i.ToString());
}
block.Complete();
block.Completion.Wait();
}
private static string HandleString(string s)
{
// Perform some computation on s...
Thread.Sleep(200);
return s;
}
private static void OutputNumbersWithParallelLinq()
{
var myNumbers = Enumerable.Range(0, 20).AsParallel()
.AsOrdered()
.WithExecutionMode(ParallelExecutionMode.ForceParallelism)
.WithMergeOptions(ParallelMergeOptions.NotBuffered);
var processed = from i in myNumbers
select HandleString(i.ToString());
foreach (var s in processed)
{
Console.WriteLine(s);
}
}
}
The first set of numbers is calculated using a method rather similar to yours—with TPL Dataflow. The numbers are out-of-order.
The second set of numbers, output by OutputNumbersWithParallelLinq(), doesn't use Dataflow at all. It relies on the Parallel LINQ features built into .NET. This runs my HandleString() method on background threads, but keeps the data in order through to the end.
The limitation here is that PLINQ doesn't let you supply an async method. (Well, you could, but it wouldn't give you the desired behavior.) HandleString() is a conventional synchronous method; it just gets executed on a background thread.
And here's a more complex Dataflow example that does preserve the correct order:
private static void OutputNumbersWithDataflowTransformBlock()
{
Random r = new Random();
var transformBlock = new TransformBlock<string, string>(
async s =>
{
// Make the delay extra random, just to be sure.
await Task.Delay(160 + r.Next(80));
return s;
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });
// For a GUI application you should also set the
// scheduler here to make sure the output happens
// on the correct thread.
var outputBlock = new ActionBlock<string>(
s => Console.WriteLine("Handled {0}.", s),
new ExecutionDataflowBlockOptions
{
SingleProducerConstrained = true,
MaxDegreeOfParallelism = 1
});
transformBlock.LinkTo(outputBlock, new DataflowLinkOptions { PropagateCompletion = true });
for (int i = 0; i < 20; i++)
{
transformBlock.Post(i.ToString());
}
transformBlock.Complete();
outputBlock.Completion.Wait();
}

Categories