If I have code that abstracts a staged sequence of asynchronous operations by returning a Task representing each stage, how can I ensure that continuations execute in stage order (i.e. the order in which the Tasks are completed)?
Note that this is a different requirement from simply 'not wasting time waiting for slower tasks'. The order needs to be guaranteed without race conditions in the scheduling. This looser requirement could addressed by parts of the answers to the following questions:
Sort Tasks into order of completition
Is there default way to get first task that finished successfully?
I think the logical solution would be to attach the continuations using a custom TaskScheduler (such as one based on a SynchronizationContext). However, I can't find any assurance that the scheduling of continuations is performed synchronously upon task completion.
In code this could be something like
class StagedOperationSource
{
public TaskCompletionSource Connect = new TaskCompletionSource();
public TaskCompletionSource Accept = new TaskCompletionSource();
public TaskCompletionSource Complete = new TaskCompletionSource();
}
class StagedOperation
{
public Task Connect, Accept, Complete;
public StagedOperation(StagedOperationSource source)
{
Connect = source.Connect.Task;
Accept = source.Accept.Task;
Complete = source.Complete.Task;
}
}
...
private StagedOperation InitiateStagedOperation(int opId)
{
var source = new StagedOperationSource();
Task.Run(GetRunnerFromOpId(opId, source));
return new StagedOperation(source);
}
...
public RunOperations()
{
for (int i=0; i<3; i++)
{
var op = InitiateStagedOperation(i);
op.Connect.ContinueWith(t => Console.WriteLine("{0}: Connected", i));
op.Accept.ContinueWith(t => Console.WriteLine("{0}: Accepted", i));
op.Complete.ContinueWith(t => Console.WriteLine("{0}: Completed", i));
}
}
which should produce output similar to
0: Connected
1: Connected
0: Accepted
2: Connected
0: Completed
1: Accepted
2: Accepted
2: Completed
1: Completed
Obviously the example is missing details like forwarding exceptions to (or cancelling) later stages if an earlier stage fails, but its just an example.
Just await each stage before going onto the next...
public static async Task ProcessStagedOperation(StagedOperation operation, int i)
{
await operation.Connect;
Console.WriteLine("{0}: Connected", i);
await operation.Accept;
Console.WriteLine("{0}: Accepted", i);
await operation.Complete;
Console.WriteLine("{0}: Completed", i);
}
You can then call that method in your for loop.
If you use TAP (Task Asynchronous Programming), i.e. async and await, you can make the flow of processing a lot more apparent. In this case I would create a new method to encapsulate the order of operations:
public async Task ProcessStagedOperation(StagedOperation op, int i)
{
await op.Connect;
Console.WriteLine("{0}: Connected", i);
await op.Accept;
Console.WriteLine("{0}: Accepted", i)
await op.Complete;
Console.WriteLine("{0}: Completed", i)
}
Now your processing loop gets simplified a bit:
public async Task RunOperations()
{
List<Task> pendingOperations = new List<Task>();
for (int i=0; i<3; i++)
{
var op = InitiateStagedOperation(i);
pendingOperations.Add(ProcessStagedOperation(op, i));
}
await Task.WhenAll(pendingOperations); // finish
}
You now have a reference to a task object you can explicitly wait or simply await from another context. (or you can simply ignore it). The way I modified the RunOperations() method allows you to create a large queue of pending tasks but not block while you wait for them all to finish.
Related
I started to have HUGE doubts regarding my code and I need some advice from more experienced programmers.
In my application on the button click, the application runs a command, that is calling a ScrapJockeys method:
if (UpdateJockeysPl) await ScrapJockeys(JPlFrom, JPlTo + 1, "jockeysPl"); //1 - 1049
ScrapJockeys is triggering a for loop, repeating code block between 20K - 150K times (depends on the case). Inside the loop, I need to call a service method, where the execution of the method takes a lot of time. Also, I wanted to have the ability of cancellation of the loop and everything that is going on inside of the loop/method.
Right now I am with a method with a list of tasks, and inside of the loop is triggered a Task.Run. Inside of each task, I am calling an awaited service method, which reduces execution time of everything to 1/4 comparing to synchronous code. Also, each task has assigned a cancellation token, like in the example GitHub link:
public async Task ScrapJockeys(int startIndex, int stopIndex, string dataType)
{
//init values and controls in here
List<Task> tasks = new List<Task>();
for (int i = startIndex; i < stopIndex; i++)
{
int j = i;
Task task = Task.Run(async () =>
{
LoadedJockey jockey = new LoadedJockey();
CancellationToken.ThrowIfCancellationRequested();
if (dataType == "jockeysPl") jockey = await _scrapServices.ScrapSingleJockeyPlAsync(j);
if (dataType == "jockeysCz") jockey = await _scrapServices.ScrapSingleJockeyCzAsync(j);
//doing some stuff with results in here
}, TokenSource.Token);
tasks.Add(task);
}
try
{
await Task.WhenAll(tasks);
}
catch (OperationCanceledException)
{
//
}
finally
{
await _dataServices.SaveAllJockeysAsync(Jockeys.ToList()); //saves everything to JSON file
//soing some stuff with UI props in here
}
}
So about my question, is there everything fine with my code? According to this article:
Many async newbies start off by trying to treat asynchronous tasks the
same as parallel (TPL) tasks and this is a major misstep.
What should I use then?
And according to this article:
On a busy server, this kind of implementation can kill scalability.
So how am I supposed to do it?
Please be noted, that the service interface method signature is Task<LoadedJockey> ScrapSingleJockeyPlAsync(int index);
And also I am not 100% sure that I am using Task.Run correctly within my service class. The methods inside are wrapping the code inside await Task.Run(() =>, like in the example GitHub link:
public async Task<LoadedJockey> ScrapSingleJockeyPlAsync(int index)
{
LoadedJockey jockey = new LoadedJockey();
await Task.Run(() =>
{
//do some time consuming things
});
return jockey;
}
As far as I understand from the articles, this is a kind of anti-pattern. But I am confused a bit. Based on this SO reply, it should be fine...? If not, how to replace it?
On the UI side, you should be using Task.Run when you have CPU-bound code that is long enough that you need to move it off the UI thread. This is completely different than the server side, where using Task.Run at all is an anti-pattern.
In your case, all your code seems to be I/O-based, so I don't see a need for Task.Run at all.
There is a statement in your question that conflicts with the provided code:
I am calling an awaited service method
public async Task<LoadedJockey> ScrapSingleJockeyPlAsync(int index)
{
await Task.Run(() =>
{
//do some time consuming things
});
}
The lambda passed to Task.Run is not async, so the service method cannot possibly be awaited. And indeed it is not.
A better solution would be to load the HTML asynchronously (e.g., using HttpClient.GetStringAsync), and then call HtmlDocument.LoadHtml, something like this:
public async Task<LoadedJockey> ScrapSingleJockeyPlAsync(int index)
{
LoadedJockey jockey = new LoadedJockey();
...
string link = sb.ToString();
var html = await httpClient.GetStringAsync(link).ConfigureAwait(false);
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
if (jockey.Name == null)
...
return jockey;
}
And also remove the Task.Run from your for loop:
private async Task ScrapJockey(string dataType)
{
LoadedJockey jockey = new LoadedJockey();
CancellationToken.ThrowIfCancellationRequested();
if (dataType == "jockeysPl") jockey = await _scrapServices.ScrapSingleJockeyPlAsync(j).ConfigureAwait(false);
if (dataType == "jockeysCz") jockey = await _scrapServices.ScrapSingleJockeyCzAsync(j).ConfigureAwait(false);
//doing some stuff with results in here
}
public async Task ScrapJockeys(int startIndex, int stopIndex, string dataType)
{
//init values and controls in here
List<Task> tasks = new List<Task>();
for (int i = startIndex; i < stopIndex; i++)
{
tasks.Add(ScrapJockey(dataType));
}
try
{
await Task.WhenAll(tasks);
}
catch (OperationCanceledException)
{
//
}
finally
{
await _dataServices.SaveAllJockeysAsync(Jockeys.ToList()); //saves everything to JSON file
//soing some stuff with UI props in here
}
}
As far as I understand from the articles, this is a kind of anti-pattern.
It is an anti-pattern. But if can't modify the service implementation, you should at least be able to execute the tasks in parallel. Something like this:
public async Task ScrapJockeys(int startIndex, int stopIndex, string dataType)
{
ConcurrentBag<Task> tasks = new ConcurrentBag<Task>();
ParallelOptions parallelLoopOptions = new ParallelOptions() { CancellationToken = CancellationToken };
Parallel.For(startIndex, stopIndex, parallelLoopOptions, i =>
{
int j = i;
switch (dataType)
{
case "jockeysPl":
tasks.Add(_scrapServices.ScrapSingleJockeyPlAsync(j));
break;
case "jockeysCz":
tasks.Add(_scrapServices.ScrapSingleJockeyCzAsync(j));
break;
}
});
try
{
await Task.WhenAll(tasks);
}
catch (OperationCanceledException)
{
//
}
finally
{
await _dataServices.SaveAllJockeysAsync(Jockeys.ToList()); //saves everything to JSON file
//soing some stuff with UI props in here
}
}
I'm new to tasks and have a question regarding the usage. Does the Task.Factory fire for all items in the foreach loop or block at the 'await' basically making the program single threaded? If I am thinking about this correctly, the foreach loop starts all the tasks and the .GetAwaiter().GetResult(); is blocking the main thread until the last task is complete.
Also, I'm just wanting some anonymous tasks to load the data. Would this be a correct implementation? I'm not referring to exception handling as this is simply an example.
To provide clarity, I am loading data into a database from an outside API. This one is using the FRED database. (https://fred.stlouisfed.org/), but I have several I will hit to complete the entire transfer (maybe 200k data points). Once they are done I update the tables, refresh market calculations, etc. Some of it is real time and some of it is End-of-day. I would also like to say, I currently have everything working in docker, but have been working to update the code using tasks to improve execution.
class Program
{
private async Task SQLBulkLoader()
{
foreach (var fileListObj in indicators.file_list)
{
await Task.Factory.StartNew( () =>
{
string json = this.GET(//API call);
SeriesObject obj = JsonConvert.DeserializeObject<SeriesObject>(json);
DataTable dataTableConversion = ConvertToDataTable(obj.observations);
dataTableConversion.TableName = fileListObj.series_id;
using (SqlConnection dbConnection = new SqlConnection("SQL Connection"))
{
dbConnection.Open();
using (SqlBulkCopy s = new SqlBulkCopy(dbConnection))
{
s.DestinationTableName = dataTableConversion.TableName;
foreach (var column in dataTableConversion.Columns)
s.ColumnMappings.Add(column.ToString(), column.ToString());
s.WriteToServer(dataTableConversion);
}
Console.WriteLine("File: {0} Complete", fileListObj.series_id);
}
});
}
}
static void Main(string[] args)
{
Program worker = new Program();
worker.SQLBulkLoader().GetAwaiter().GetResult();
}
}
Your awaiting the task returned from Task.Factory.StartNew does make it effectively single threaded. You can see a simple demonstration of this with this short LinqPad example:
for (var i = 0; i < 3; i++)
{
var index = i;
$"{index} inline".Dump();
await Task.Run(() =>
{
Thread.Sleep((3 - index) * 1000);
$"{index} in thread".Dump();
});
}
Here we wait less as we progress through the loop. The output is:
0 inline
0 in thread
1 inline
1 in thread
2 inline
2 in thread
If you remove the await in front of StartNew you'll see it runs in parallel. As others have mentioned, you can certainly use Parallel.ForEach, but for a demonstration of doing it a bit more manually, you can consider a solution like this:
var tasks = new List<Task>();
for (var i = 0; i < 3; i++)
{
var index = i;
$"{index} inline".Dump();
tasks.Add(Task.Factory.StartNew(() =>
{
Thread.Sleep((3 - index) * 1000);
$"{index} in thread".Dump();
}));
}
Task.WaitAll(tasks.ToArray());
Notice now how the result is:
0 inline
1 inline
2 inline
2 in thread
1 in thread
0 in thread
This is a typical problem that C# 8.0 Async Streams are going to solve very soon.
Until C# 8.0 is released, you can use the AsyncEnumarator library:
using System.Collections.Async;
class Program
{
private async Task SQLBulkLoader() {
await indicators.file_list.ParallelForEachAsync(async fileListObj =>
{
...
await s.WriteToServerAsync(dataTableConversion);
...
},
maxDegreeOfParalellism: 3,
cancellationToken: default);
}
static void Main(string[] args)
{
Program worker = new Program();
worker.SQLBulkLoader().GetAwaiter().GetResult();
}
}
I do not recommend using Parallel.ForEach and Task.WhenAll as those functions are not designed for asynchronous streaming.
You'll want to add each task to a collection and then use Task.WhenAll to await all of the tasks in that collection:
private async Task SQLBulkLoader()
{
var tasks = new List<Task>();
foreach (var fileListObj in indicators.file_list)
{
tasks.Add(Task.Factory.StartNew( () => { //Doing Stuff }));
}
await Task.WhenAll(tasks.ToArray());
}
My take on this: most time consuming operations will be getting the data using a GET operation and the actual call to WriteToServer using SqlBulkCopy. If you take a look at that class you will see that there is a native async method WriteToServerAsync method (docs here)
. Always use those before creating Tasks yourself using Task.Run.
The same applies to the http GET call. You can use the native HttpClient.GetAsync (docs here) for that.
Doing that you can rewrite your code to this:
private async Task ProcessFileAsync(string series_id)
{
string json = await GetAsync();
SeriesObject obj = JsonConvert.DeserializeObject<SeriesObject>(json);
DataTable dataTableConversion = ConvertToDataTable(obj.observations);
dataTableConversion.TableName = series_id;
using (SqlConnection dbConnection = new SqlConnection("SQL Connection"))
{
dbConnection.Open();
using (SqlBulkCopy s = new SqlBulkCopy(dbConnection))
{
s.DestinationTableName = dataTableConversion.TableName;
foreach (var column in dataTableConversion.Columns)
s.ColumnMappings.Add(column.ToString(), column.ToString());
await s.WriteToServerAsync(dataTableConversion);
}
Console.WriteLine("File: {0} Complete", series_id);
}
}
private async Task SQLBulkLoaderAsync()
{
var tasks = indicators.file_list.Select(f => ProcessFileAsync(f.series_id));
await Task.WhenAll(tasks);
}
Both operations (http call and sql server call) are I/O calls. Using the native async/await pattern there won't even be a thread created or used, see this question for a more in-depth explanation. That is why for IO bound operations you should never have to use Task.Run (or Task.Factory.StartNew. But do mind that Task.Run is the recommended approach).
Sidenote: if you are using HttpClient in a loop, please read this about how to correctly use it.
If you need to limit the number of parallel actions you could also use TPL Dataflow as it plays very nice with Task based IO bound operations. The SQLBulkLoaderAsyncshould then be modified to (leaving the ProcessFileAsync method from earlier this answer intact):
private async Task SQLBulkLoaderAsync()
{
var ab = new ActionBlock<string>(ProcessFileAsync, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 5 });
foreach (var file in indicators.file_list)
{
ab.Post(file.series_id);
}
ab.Complete();
await ab.Completion;
}
Use a Parallel.ForEach loop to enable data parallelism over any System.Collections.Generic.IEnumerable<T> source.
// Method signature: Parallel.ForEach(IEnumerable<TSource> source, Action<TSource> body)
Parallel.ForEach(fileList, (currentFile) =>
{
//Doing Stuff
Console.WriteLine("Processing {0} on thread {1}", currentFile, Thread.CurrentThread.ManagedThreadId);
});
Why didnt you try this :), this program will not start parallel tasks (in a foreach), it will be blocking but the logic in task will be done in separate thread from threadpool (only one at the time, but the main thread will be blocked).
The proper approach in your situation is to use Paraller.ForEach
How can I convert this foreach code to Parallel.ForEach?
I have been researching (including looking at all other SO posts on this topic) the best way to implement a (most likely) Windows Service worker that will pull items of work from a database and process them in parallel asynchronously in a 'fire-and-forget' manner in the background (the work item management will all be handled in the asynchronous method). The work items will be web service calls and database queries. There will be some throttling applied to the producer of these work items to ensure some kind of measured approach to scheduling the work. The examples below are very basic and are just there to highlight the logic of the while loop and for loop in place. Which is the ideal method or does it not matter? Is there a more appropriate/performant way of achieving this?
async/await...
private static int counter = 1;
static void Main(string[] args)
{
Console.Title = "Async";
Task.Run(() => AsyncMain());
Console.ReadLine();
}
private static async void AsyncMain()
{
while (true)
{
// Imagine calling a database to get some work items to do, in this case 5 dummy items
for (int i = 0; i < 5; i++)
{
var x = DoSomethingAsync(counter.ToString());
counter++;
Thread.Sleep(50);
}
Thread.Sleep(1000);
}
}
private static async Task<string> DoSomethingAsync(string jobNumber)
{
try
{
// Simulated mostly IO work - some could be long running
await Task.Delay(5000);
Console.WriteLine(jobNumber);
}
catch (Exception ex)
{
LogException(ex);
}
Log("job {0} has completed", jobNumber);
return "fire and forget so not really interested";
}
Task.Run...
private static int counter = 1;
static void Main(string[] args)
{
Console.Title = "Task";
while (true)
{
// Imagine calling a database to get some work items to do, in this case 5 dummy items
for (int i = 0; i < 5; i++)
{
var x = Task.Run(() => { DoSomethingAsync(counter.ToString()); });
counter++;
Thread.Sleep(50);
}
Thread.Sleep(1000);
}
}
private static string DoSomethingAsync(string jobNumber)
{
try
{
// Simulated mostly IO work - some could be long running
Task.Delay(5000);
Console.WriteLine(jobNumber);
}
catch (Exception ex)
{
LogException(ex);
}
Log("job {0} has completed", jobNumber);
return "fire and forget so not really interested";
}
pull items of work from a database and process them in parallel asynchronously in a 'fire-and-forget' manner in the background
Technically, you want concurrency. Whether you want asynchronous concurrency or parallel concurrency remains to be seen...
The work items will be web service calls and database queries.
The work is I/O-bound, so that implies asynchronous concurrency as the more natural approach.
There will be some throttling applied to the producer of these work items to ensure some kind of measured approach to scheduling the work.
The idea of a producer/consumer queue is implied here. That's one option. TPL Dataflow provides some nice producer/consumer queues that are async-compatible and support throttling.
Alternatively, you can do the throttling yourself. For asynchronous code, there's a built-in throttling mechanism called SemaphoreSlim.
TPL Dataflow approach, with throttling:
private static int counter = 1;
static void Main(string[] args)
{
Console.Title = "Async";
var x = Task.Run(() => MainAsync());
Console.ReadLine();
}
private static async Task MainAsync()
{
var blockOptions = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 7
};
var block = new ActionBlock<string>(DoSomethingAsync, blockOptions);
while (true)
{
var dbData = await ...; // Imagine calling a database to get some work items to do, in this case 5 dummy items
for (int i = 0; i < 5; i++)
{
block.Post(counter.ToString());
counter++;
Thread.Sleep(50);
}
Thread.Sleep(1000);
}
}
private static async Task DoSomethingAsync(string jobNumber)
{
try
{
// Simulated mostly IO work - some could be long running
await Task.Delay(5000);
Console.WriteLine(jobNumber);
}
catch (Exception ex)
{
LogException(ex);
}
Log("job {0} has completed", jobNumber);
}
Asynchronous concurrency approach with manual throttling:
private static int counter = 1;
private static SemaphoreSlim semaphore = new SemaphoreSlim(7);
static void Main(string[] args)
{
Console.Title = "Async";
var x = Task.Run(() => MainAsync());
Console.ReadLine();
}
private static async Task MainAsync()
{
while (true)
{
var dbData = await ...; // Imagine calling a database to get some work items to do, in this case 5 dummy items
for (int i = 0; i < 5; i++)
{
var x = DoSomethingAsync(counter.ToString());
counter++;
Thread.Sleep(50);
}
Thread.Sleep(1000);
}
}
private static async Task DoSomethingAsync(string jobNumber)
{
await semaphore.WaitAsync();
try
{
try
{
// Simulated mostly IO work - some could be long running
await Task.Delay(5000);
Console.WriteLine(jobNumber);
}
catch (Exception ex)
{
LogException(ex);
}
Log("job {0} has completed", jobNumber);
}
finally
{
semaphore.Release();
}
}
As a final note, I hardly ever recommend my own book on SO, but I do think it would really benefit you. In particular, sections 8.10 (Blocking/Asynchronous Queues), 11.5 (Throttling), and 4.4 (Throttling Dataflow Blocks).
First of all, let's fix some.
In the second example you are calling
Task.Delay(5000);
without await. It is a bad idea. It creates a new Task instance which runs for 5 seconds but no one is waiting for it. Task.Delay is only useful with await. Mind you, do not use Task.Delay(5000).Wait() or you are going to get deadlocked.
In your second example you are trying to make the DoSomethingAsync method synchronous, lets call it DoSomethingSync and replace the Task.Delay(5000); with Thread.Sleep(5000);
Now, the second example is almost the old-school ThreadPool.QueueUserWorkItem. And there is nothing bad with it in case you are not using some already-async API inside. Task.Run and ThreadPool.QueueUserWorkItem used in the fire-and-forget case are just the same thing. I would use the latter for clarity.
This slowly drives us to the answer to the main question. Async or not async - this is the question! I would say: "Do not create async methods in case you do not have to use some async IO inside your code". If however there is async API you have to use than the first approach would be more expected by those who are going to read your code years later.
Let's consider the method:
Task Foo(IEnumerable items, CancellationToken token)
{
return Task.Run(() =>
{
foreach (var i in items)
token.ThrowIfCancellationRequested();
}, token);
}
Then I have a consumer:
var cts = new CancellationTokenSource();
var task = Foo(Items, cts.token);
task.Wait();
And the example of Items:
IEnumerable Items
{
get
{
yield return 0;
Task.Delay(Timeout.InfiniteTimeSpan).Wait();
yield return 1;
}
}
What about task.Wait?
I cannot put my cancel token into collection of items.
How to kill the not responding task or get around this?
I found one solution that allows to put cancellation token into Items originating from thid parties:
public static IEnumerable<T> ToCancellable<T>(this IEnumerable<T> #this, CancellationToken token)
{
var enumerator = #this.GetEnumerator();
for (; ; )
{
var task = Task.Run(() => enumerator.MoveNext(), token);
task.Wait(token);
if (!task.Result)
yield break;
yield return enumerator.Current;
}
}
Now I need to use:
Items.ToCancellable(cts.token)
And that will not hang after cancel request.
You can't really cancel a non-cancellable operation. Stephen Toub goes into details in "How do I cancel non-cancelable async operations?" on the Parallel FX Team's blog but the essence is that you need to understand what you actually want to do?
Stop the asynchronous/long-running operation itself? Not doable in a cooperative way, if you don't have a way to signal the operation
Stop waiting for the operation to finish, ignoring any results? That's doable, but can lead to unreliability for obvious reasons. You can start a Task with the long operation passing a cancellation token, or use a TaskCompletionSource as Stephen Toub describes.
You need to decide which behavior you want to find the proper solution
Why can't you pass the CancellationToken to Items()?
IEnumerable Items(CancellationToken ct)
{
yield return 0;
Task.Delay(Timeout.InfiniteTimeSpan, ct).Wait();
yield return 1;
}
You would have to pass the same token to Items() as you pass to Foo(), of course.
Try using a TaskCompletionSource and returning that. You can then set the TaskCompletionSource to the result (or the error) of the inner task if it runs to completion (or faults). But you can set it to canceled immediately if the CancellationToken gets triggered.
Task<int> Foo(IEnumerable<int> items, CancellationToken token)
{
var tcs = new TaskCompletionSource<int>();
token.Register(() => tcs.TrySetCanceled());
var innerTask = Task.Factory.StartNew(() =>
{
foreach (var i in items)
token.ThrowIfCancellationRequested();
return 7;
}, token);
innerTask.ContinueWith(task => tcs.TrySetResult(task.Result), TaskContinuationOptions.OnlyOnRanToCompletion);
innerTask.ContinueWith(task => tcs.TrySetException(task.Exception), TaskContinuationOptions.OnlyOnFaulted);
return tcs.Task;
}
This won't actually kill the inner task, but it'll give you a task that you can continue from immediately on cancellation. To kill the inner task since it's hanging out in an infinite timeout, I believe the only thing you can do is to grab a reference to Thread.CurrentThread where you start the task, and then call taskThread.Abort() from within Foo, which of course is bad practice. But in this case your question really comes down to "how can I make a long running function terminate without having access to the code", which is only doable via Thread.Abort.
Can you have Items be IEnumerable<Task<int>> instead of IEnumerable<int>? Then you could do
return Task.Run(() =>
{
foreach (var task in tasks)
{
task.Wait(token);
token.ThrowIfCancellationRequested();
var i = task.Result;
}
}, token);
Although something like this may be more straightforward to do using Reactive Framework and doing items.ToObservable. That would look like this:
static Task<int> Foo(IEnumerable<int> items, CancellationToken token)
{
var sum = 0;
var tcs = new TaskCompletionSource<int>();
var obs = items.ToObservable(ThreadPoolScheduler.Instance);
token.Register(() => tcs.TrySetCanceled());
obs.Subscribe(i => sum += i, tcs.SetException, () => tcs.TrySetResult(sum), token);
return tcs.Task;
}
How about creating a wrapper around the enumerable that is itself cancellable between items?
IEnumerable<T> CancellableEnum<T>(IEnumerable<T> items, CancellationToken ct) {
foreach (var item in items) {
ct.ThrowIfCancellationRequested();
yield return item;
}
}
...though that seems to be kind of what Foo() already does. If you have some place where this enumerable blocks literally infinitely (and it's not just very slow), then what you would do is add a timeout and/or a cancellation token to the task.Wait() on the consumer side.
My previous solution was based on an optimistic assumption that the enumerable is likely to not hang and is quite fast. Thus we could sometimes sucrifice one thread of the system's thread pool? As Dax Fohl pointed out, the task will be still active even if its parent task has been killed by cancel exception. And in this regard, that could chock up the underlying ThreadPool, which is used by default task scheduler, if several collections have been frozen indefinitely.
Consequently I have refactored ToCancellable method:
public static IEnumerable<T> ToCancellable<T>(this IEnumerable<T> #this, CancellationToken token)
{
var enumerator = #this.GetEnumerator();
var state = new State();
for (; ; )
{
token.ThrowIfCancellationRequested();
var thread = new Thread(s => { ((State)s).Result = enumerator.MoveNext(); }) { IsBackground = true, Priority = ThreadPriority.Lowest };
thread.Start(state);
try
{
while (!thread.Join(10))
token.ThrowIfCancellationRequested();
}
catch (OperationCanceledException)
{
thread.Abort();
throw;
}
if (!state.Result)
yield break;
yield return enumerator.Current;
}
}
And a helping class to manage the result:
class State
{
public bool Result { get; set; }
}
It is safe to abort a detached thread.
The pain, that I see here is a thread creation which is heavy. That could be solved by using custom thread pool along with producer-consumer pattern that will be able to handle abort exceptions in order to remove broken thread from the pool.
Another problem is at Join line. What is the best pause here? Maybe that should be in user charge and shiped as a method argument.
I have two operations - long running OperationA and much quicker OperationB. I was running them in parallel using TAP and returning results as they both finish :
var taskA = Task.Factory.StartNew(()=>OperationA());
var taskB = Task.Factory.StartNew(()=>OperationB());
var tasks = new Task[] { taskA, taskB };
Task.WaitAll(tasks);
// processing taskA.Result, taskB.Result
No magic here. Now what I want to do is repeat OperationB when it's finished indefinitely in case OperationA is still running. So whole procedure finish point will occur when OperationA is finished and last pass of OperationB is finished. I'm looking for some sort of effective pattern for doing that will not involve polling for OperationA's Status in while loop if that's possible. Looking toward improving WaitAllOneByOne pattern proposed in this Pluralsight course or something similar.
Try this
// Get cancellation support.
CancellationTokenSource source = new CancellationTokenSource();
CancellationToken token = source.Token;
// Start off A and set continuation to cancel B when finished.
bool taskAFinished = false;
var taskA = Task.Factory.StartNew(() => OperationA());
Task contA = taskA.ContinueWith(ant => source.Cancel());
// Set off B and in the method perform your loop. Cancellation with be thrown when
// A has completed.
var taskB = Task.Factory.StartNew(() => OperationB(token), token);
Task contB = taskB.ContinueWith(ant =>
{
switch (task.Status)
{
// Handle any exceptions to prevent UnobservedTaskException.
case TaskStatus.RanToCompletion:
// Do stuff.
break;
case TaskStatus.Canceled:
// You know TaskA is finished.
break;
case TaskStatus.Faulted:
// Something bad.
break;
}
});
then in the OperationB method you can perform your loop and include a cancellation upon TaskA's compleation...
private void OperationB(CancellationToken token)
{
foreach (var v in object)
{
...
token.ThrowIfCancellationRequested(); // This must be handeled. AggregateException.
}
}
Note, instead of complicating with a cancellation, you can just set a bool from with in the continuation of TaskA and check for this in TaskB' loop - this will avoid any faffing about with cancellations.
I hope this helps
Took your approach as basis and adapted a bit :
var source = new CancellationTokenSource();
var token = source.Token;
var taskA = Task.Factory.StartNew(
() => OperationA()
);
var taskAFinished = taskA.ContinueWith(antecedent =>
{
source.Cancel();
return antecedent.Result;
});
var taskB = Task.Factory.StartNew(
() => OperationB(token), token
);
var taskBFinished = taskB.ContinueWith(antecedent =>
{
switch (antecedent.Status)
{
case TaskStatus.RanToCompletion:
case TaskStatus.Canceled:
try
{
return ant.Result;
}
catch (AggregateException ae)
{
// Operation was canceled before start if OperationA is short
return null;
}
case TaskStatus.Faulted:
return null;
}
return null;
});
Made two continuations that returns results for respective operations so I could make wait for them both to be finished (tried to do that only with second one and it didn't work).
var tasks = new Task[] { taskAFinished, taskBFinished };
Task.WaitAll(tasks);
First one is just passing antecedent's task Result further, second takes aggregate results available at this point in OperationB (both RanToCompletion and Canceled statuses are considered correct end of process). OperationB now looks like this :
public static List<Result> OperationB(CancellationToken token)
{
var resultsList = new List<Result>();
while (true)
{
foreach (var op in operations)
{
resultsList.Add(op.GetResult();
}
if (token.IsCancellationRequested)
{
return resultsList;
}
}
}
Changes logic a bit - all loops inside OperationB now are considered as single task, but this is easier than keep them atomic and write some sort of coordination primitive that will gather results from each run. In case I don't really care which loop produced which results this seems to be a decent solution. May improve to more flexible implementation later if needed (what I was actually looking for is to chain multiple operations recursively - OperationB itself may have smaller repeating OperationC's inside with same behavior, OperationC - multiple OperationD's that are running when C is active etc.).
edit
Added exception handling in taskBFinished for case when OperationA is quick and cancellation is issued before OperationB is even started.