I have a background service that will be started when the application performing startup. The background service will start to create multiple tasks based on how many workers are set. As I do various trials and monitor the open connection on DB. The open connection is always the same value as the worker I set. Let say I set 32 workers, then the connection will be always 32 open connections shown as I use query to check it. FYI I am using Postgres as the DB server. In order to check the open connection, I use the query below to check the connection when the application is running.
select * from pg_stat_activity where application_name = 'myapplication';
Below is the background service code.
public class MessagingService : BackgroundService {
private int worker = 32;
protected override async Task ExecuteAsync(CancellationToken cancellationToken) {
var tasks = new List<Task>();
for (int i=0; i<worker; i++) {
tasks.Add(DoJob(cancellationToken));
}
while (!cancellationToken.IsCancellationRequested) {
try {
var completed = await Task.WhenAny(tasks);
tasks.Remove(completed);
} catch (Exception) {
await Task.Delay(1000, cancellationToken);
}
if (!cancellationToken.IsCancellationRequested) {
tasks.Add(DoJob(cancellationToken));
}
}
}
private async Task DoJob(CancellationToken cancellationToken) {
using (var scope = _services.CreateScope()) {
var service = scope.ServiceProvider
.GetRequiredService<MessageService>();
try {
//do select and update query on db if null return false otherwise send mail
if (!await service.Run(cancellationToken)) {
await Task.Delay(1000, cancellationToken);
}
} catch (Exception) {
await Task.Delay(1000, cancellationToken);
}
}
}
}
The workflow is not right as it will keep creating the task and leave the connection open and idle. Also, the CPU and memory usage are high when running those tasks. How can I achieve like when there is no record found on DB only keep 1 worker running at the moment? If a record or more is found it will keep increasing until the preset max worker then decreasing the worker when the record is less than the max worker. If this question is too vague or opinion-based then please let me know and I will try my best to make it as specific as possible.
Update Purpose
The purpose of this service is to perform email delivery. There is another API that will be used to create a scheduled job. Once the job is added to the DB, this service will do the email delivery at the scheduled time. Eg, 5k schedule jobs are added to the DB and the scheduled time to perform the job is '2021-12-31 08:00:00' and the time when creating the scheduled job is 2021-12-31 00:00:00'. The service will keep on looping from 00:00:00 until 08:00:00 with 32 workers running at the same time then just start to do the email delivery. How can I improve it to more efficiency like normally when there is no job scheduled only 1 worker is running. When it checked there is 5k scheduled job it will fully utilise all the worker. After 5k job is completed, it will back to 1 workers.
My suggestion is to spare yourself from the burden of manually creating and maintaining worker tasks, by using an ActionBlock<T> from the TPL Dataflow library. This component is a combination of an input queue and an Action<T> delegate. You specify the delegate in its constructor, and you feed it with messages with its Post method. The component invokes the delegate for each message it receives, with the specified degree of parallelism. When there are no more messages to send, you notify it by invoking its Complete method, and then await its Completion so that you know that all work that was delegated to it has completed.
Below is a rough demonstration if how you could use this component:
protected override async Task ExecuteAsync(CancellationToken cancellationToken)
{
var processor = new ActionBlock<Job>(async job =>
{
await ProcessJob(job);
await MarkJobAsCompleted(job);
}, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 32
});
try
{
while (true)
{
Task delayTask = Task.Delay(TimeSpan.FromSeconds(60), cancellationToken);
Job[] jobs = await FetchReadyToProcessJobs();
foreach (var job in jobs)
{
await MarkJobAsPending(job);
processor.Post(job);
}
await delayTask; // Will throw when the token is canceled
}
}
finally
{
processor.Complete();
await processor.Completion;
}
}
The FetchReadyToProcessJobs method is supposed to connect to the database, and fetch all the jobs whose time has come to be processed. In the above example this method is invoked every 60 seconds. The Task.Delay is created before invoking the method, and awaited after the returned jobs have been posted to the ActionBlock<T>. This way the interval between invocations will be stable and consistent.
Related
I'm not sure if the title makes sense, it was the best I could come up with, so here's my scenario.
I have an ASP.NET Core app that I'm using more as a shell and for DI configuration. In Startup it adds a bunch of IHostedServices as singletons, along with their dependencies, also as singletons, with minor exceptions for SqlConnection and DbContext which we'll get to later. The hosted services are groups of similar services that:
Listen for incoming reports from GPS devices and put into a listening buffer.
Parse items out of the listening buffer and put into a parsed buffer.
Eventually there's a single service that reads the parsed buffer and actually processes the parsed reports. It does this by passing the report it took out of the buffer to a handler and awaits for it to complete to move to the next. This has worked well for the past year, but it appears we're running into a scalability issue now because its processing one report at a time and the average time to process is 62ms on the server which includes the Dapper trip to the database to get the data needed and the EF Core trip to save changes.
If however the handler decides that a report's information requires triggering background jobs, then I suspect it takes 100ms or more to complete. Over time, the buffer fills up faster than the handler can process to the point of holding 10s if not 100s of thousands of reports until they can be processed. This is an issue because notifications are delayed and because it has the potential for data loss if the buffer is still full by the time the server restarts at midnight.
All that being said, I'm trying to figure out how to make the processing parallel. After lots of experimentation yesterday, I settled on using Parallel.ForEach over the buffer using GetConsumingEnumerable(). This works well, except for a weird behavior I don't know what to do about or even call. As the buffer is filled and the ForEach is iterating over it it will begin to "chunk" the processing into ever increasing multiples of two. The size of the chunking is affected by the MaxDegreeOfParallelism setting. For example (N# = Next # of reports in buffer):
MDP = 1
N3 = 1 at a time
N6 = 2 at a time
N12 = 4 at a time
...
MDP = 2
N6 = 1 at a time
N12 = 2 at a time
N24 = 4 at a time
...
MDP = 4
N12 = 1 at a time
N24 = 2 at a time
N48 = 4 at a time
...
MDP = 8 (my CPU core count)
N24 = 1 at a time
N48 = 2 at a time
N96 = 4 at a time
...
This is arguably worse than the serial execution I have now because by the end of the day it will buffer and wait for, say, half a million reports before actually processing them.
Is there a way to fix this? I'm not very experienced with Parallel.ForEach so from my point of view this is strange behavior. Ultimately I'm looking for a way to parallel process the reports as soon as they are in the buffer, so if there's other ways to accomplish this I'm all ears. This is roughly what I have for the code. The handler that processes the reports does use IServiceProvider to create a scope and get an instance of SqlConnection and DbContext. Thanks in advance for any suggestions!
public sealed class GpsReportService :
IHostedService {
private readonly GpsReportBuffer _buffer;
private readonly Config _config;
private readonly GpsReportHandler _handler;
private readonly ILogger _logger;
public GpsReportService(
GpsReportBuffer buffer,
Config config,
GpsReportHandler handler,
ILogger<GpsReportService> logger) {
_buffer = buffer;
_config = config;
_handler = handler;
_logger = logger;
}
public Task StartAsync(
CancellationToken cancellationToken) {
_logger.LogInformation("GPS Report Service => starting");
Task.Run(Process, cancellationToken).ConfigureAwait(false);// Is ConfigureAwait here correct usage?
_logger.LogInformation("GPS Report Service => started");
return Task.CompletedTask;
}
public Task StopAsync(
CancellationToken cancellationToken) {
_logger.LogInformation("GPS Parsing Service => stopping");
_buffer.CompleteAdding();
_logger.LogInformation("GPS Parsing Service => stopped");
return Task.CompletedTask;
}
// ========================================================================
// Utilities
// ========================================================================
private void Process() {
var options = new ParallelOptions {
MaxDegreeOfParallelism = 8,
CancellationToken = CancellationToken.None
};
Parallel.ForEach(_buffer.GetConsumingEnumerable(), options, async report => {
try {
await _handler.ProcessAsync(report).ConfigureAwait(false);
} catch (Exception e) {
if (_config.IsDevelopment) {
throw;
}
_logger.LogError(e, "GPS Report Service");
}
});
}
private async Task ProcessAsync() {
while (!_buffer.IsCompleted) {
try {
var took = _buffer.TryTake(out var report, 10);
if (!took) {
continue;
}
await _handler.ProcessAsync(report!).ConfigureAwait(false);
} catch (Exception e) {
if (_config.IsDevelopment) {
throw;
}
_logger.LogError(e, "GPS Report Service");
}
}
}
}
public sealed class GpsReportBuffer :
BlockingCollection<GpsReport> {
}
You can't use Parallel methods with async delegates - at least, not yet.
Since you already have a "pipeline" style of architecture, I recommend looking into TPL Dataflow. A single ActionBlock may be all that you need, and once you have that working, other blocks in TPL Dataflow may replace other parts of your pipeline.
If you prefer to stick with your existing buffer, then you should use asynchronous concurrency instead of Parallel:
private void Process() {
var throttler = new SemaphoreSlim(8);
var tasks = _buffer.GetConsumingEnumerable()
.Select(async report =>
{
await throttler.WaitAsync();
try {
await _handler.ProcessAsync(report).ConfigureAwait(false);
} catch (Exception e) {
if (_config.IsDevelopment) {
throw;
}
_logger.LogError(e, "GPS Report Service");
}
finally {
throttler.Release();
}
})
.ToList();
await Task.WhenAll(tasks);
}
You have an event stream processing/dataflow problem, not a parallelism problem. If you use the appropriate classes, like the Dataflow blocks, Channels, or Reactive Extensions the problem is simplified a lot.
Even if you want to use a single buffer and a fat worker method though, the appropriate buffer class is the asynchronous Channel, not BlockingCollection. The code could become as simple as:
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
await foreach(GpsMessage msg in _reader.ReadAllAsync(stopppingToken))
{
await _handler.ProcessAsync(msg);
}
}
The first option shows how to use a Dataflow to create a pipeline. The second, how to use Channel instead of BlockingCollection to process multiple queued items concurrently
A pipeline with Dataflow
Once you break the process into independent methods, it's easy to create a pipeline of processing steps using any library.
Task<IEnumerable<GpsMessage>> Poller(DateTime time,IList<Device> devices,CancellationToken token=default)
{
foreach(var device in devices)
{
if(token.IsCancellationRequested)
{
break;
}
var msg=await device.ReadMessage();
yield return msg;
}
}
GpsReport Parser(GpsMessage msg)
{
//Do some parsing magic.
return report;
}
async Task<GpsReport> Enrich(GpsReport report,string connectionString,CancellationToken token=default)
{
//Depend on connection pooling to eliminate the cost of connections
//We may have to use a pool of opened connections otherwise
using var con=new SqlConnection(connectionString);
var extraData=await con.QueryAsync<Extra>(sql,new {deviceId=report.DeviceId},token);
report.Extra=extraData;
return report;
}
async Task BulkImport(SqlReport[] reports,CancellationToken token=default)
{
using var bcp=new SqlBulkCopy(...);
using var reader=ObjectReader.Create(reports);
...
await bcp.WriteToServerAsync(reader,token);
}
In the BulkImport method I use FasMember's ObjectReader to create an IDataReader wrapper over the reports so I can use them with SqlBulkCopy. Another option would be to convert them to a DataTable, but that would create an extra copy of the data in memory.
Combining all these with Dataflow is relatively easy.
var execOptions=new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
}
_poller = new TransformManyBlock<DateTime,GpsBuffer>(time=>Poller(time,devices));
_parser = new TransformBlock<GpsBuffer,GpsReport>(b=>Parser(b),execOptions);
var enricher = new TransformBlock<GpsReport,GpsReport>(rpt=>Enrich(rpt,connStr),execOptions);
_batch = new BatchBlock<GpsReport>(50);
_bcpBlock = new ActionBlock<GpsReport[]>(reports=>BulkImport(reports));
Each block has an input and output buffer (except ActionBlock). Each block takes care of processing the messages in its input buffer and processes it. By default, each block uses only one worker task, but that can be changed. The message order is maintained, so if we use eg 10 worker tasks for the parser block, the messages will still be emitted in the order they were received.
Next comes linking the blocks.
var linkOptions=new DataflowLinkOptions {PropagateCompletion=true};
_poller.LinkTo(_parser,options);
_parser.LinkTo(_enricher,options);
_enricher.LinkTo(_batch,options);
_batch.LinkTo(_bcpBlock,options);
After that, a timer can be used to "ping" the head block, the poller, whenever we want:
private void Ping(object state)
{
_poller.Post(DateTime.Now);
}
public Task StartAsync(CancellationToken stoppingToken)
{
_logger.LogInformation("Timed Hosted Service running.");
_timer = new Timer(Ping, null, TimeSpan.Zero,
TimeSpan.FromSeconds(5));
return Task.CompletedTask;
}
To stop the pipeline gracefully, we call Complete() on the head block and await the Completion task on the last block. Assuming the hosted service is similar to the timed background service example:
public Task StopAsync(CancellationToken cancellationToken)
{
....
_timer?.Change(Timeout.Infinite, 0);
_poller.Complete();
await _bcpBlock.Completion;
...
}
Using Channel as an Async queue
A Channel is a far better alternative for asynchronous publisher/subscriber scenarios than BlockingCollection. Roughly, it's an asynchronous queue that goes to extremes to prevent the publisher from reading, or the subscriber from writing, by forcing callers to use the ChannelWriter and ChannelReader classes. In fact, it's quite common to only pass those classes around, never the Channel instance itself.
In your publishing code, you can create a Channel<T> and pass its Reader to the GpsReportService service. Let's assume the publisher is another service that implements an IGpsPublisher interface :
public interface IGpsPublisher
{
ChannelReader<GspMessage> Reader{get;}
}
and the implementation
Channel<GpsMessage> _channel=Channel.CreateUnbounded<GpsMessage>();
public ChannelReader<GspMessage> Reader=>_channel;
private async void Ping(object state)
{
foreach(var device in devices)
{
if(token.IsCancellationRequested)
{
break;
}
var msg=await device.ReadMessage();
await _channel.Writer.WriteAsync(msg);
}
}
public Task StartAsync(CancellationToken stoppingToken)
{
_timer = new Timer(Ping, null, TimeSpan.Zero,
TimeSpan.FromSeconds(5));
return Task.CompletedTask;
}
public Task StopAsync(CancellationToken cancellationToken)
{
_timer?.Change(Timeout.Infinite, 0);
_channel.Writer.Complete();
}
This can be passed to GpsReportService as a dependency that will be resolved by the DI container:
public sealed class GpsReportService : BackgroundService
{
private readonly ChannelReader<GpsMessage> _reader;
public GpsReportService(
IGpsPublisher publisher,
...)
{
_reader = publisher.Reader;
...
}
And used
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
await foreach(GpsMessage msg in _reader.ReadAllAsync(stopppingToken))
{
await _handler.ProcessAsync(msg);
}
}
Once the publisher completes, the subscriber loop will also complete once all messages are processed.
To process in parallel, you can start multiple loops concurrently:
async Task Process(ChannelReader<GgpsMessage> reader,CancellationToken token)
{
await foreach(GpsMessage msg in reader.ReadAllAsync(token))
{
await _handler.ProcessAsync(msg);
}
}
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
var tasks=Enumerable.Range(0,10)
.Select(_=>ProcessReader(_reader,stoppingToken))
.ToArray();
await Task.WhenAll(tasks);
}
Explaining the pipeline
I have a similar situation: every 15 minutes I request air ticket sales reports from airlines (actually GDSs), parse them to extract data and ticket numbers, download the ticket record for each ticket to get some extra data and save everything to the database. I have to do that for 20+ cities (ticket reports are per city) with each report having from 10 to over 100K tickets.
This almost begs for a pipeline. Using your example, you can create a pipeline with the following steps/blocks:
Listen for GPS messages and emit the unparsed message.
Parse the message and emit the parsed message
Load any extra data needed per message and emit a combined record
Handle the combined record and emit the result
(Optional) batch results
Save the results to the database
All three options (Dataflow, Channels, Rx) take care of buffering between the steps. Dataflow is a some-assembly-required library for pipelines processing independent events, Rx is ready-made to analyze streams of events where time is important (eg to calculate average speed in a sliding window), Channels is Lego bricks that can do anything but need to be put together.
Why not Parallel.ForEach
Parallel.ForEach is meant for data parallelism, not async operations. It's meant to process large chunks of in-memory data, independent of each other. Amdah's Law explains that parallelization benefits are limited by the synchronous part of an operation, so all data parallelism libraries try to reduce that by partitioning, and using one core/machine/node to process each partition.
Parallel.ForEach also works by partitioning the data and using roughly one worker task per CPU core, to reduce synchronization between cores. It will even use the current thread which leads to the mistaken assumption it's blocking. When all cores are busy, why not use the thread? It won't be able to run anyway.
The Parallel.ForEach employs chunk partitioning by default, which is intended for reducing the synchronization overhead in CPU-intensive applications, but can result to problematic behavior in some usage scenarios. The chunk partitioning can be disabled by passing as argument a Partitioner<T> instead of an IEnumerable<T>:
Parallel.ForEach(Partitioner.Create(_buffer.GetConsumingEnumerable(),
EnumerablePartitionerOptions.NoBuffering), options, ...
You can also find a custom partitioner, tailored specifically for BlockingCollection<T>s, in this article: ParallelExtensionsExtras Tour – #4 – BlockingCollectionExtensions
That said, the Parallel.ForEach is not async-friendly, meaning that it doesn't understand async delegates. The lambda passed is async void, which is something to avoid. So I would recommend using an ActionBlock<T> instead.
I have following code:
while (!cancellationToken.IsCancellationRequested)
{
var connection = await listener.AcceptAsync(cancellationToken);
HandleConnectionAsync(connection, cancellationToken)
.FireAndForget(HandleException);
}
The FireAndForget is an extension method:
public static async void FireAndForget(this ValueTask task, Action<Exception> exceptionHandler)
{
try
{
await task.ConfigureAwait(false);
}
catch (Exception e)
{
exceptionHandler.Invoke(e);
}
}
The while loop is the server lifecycle. When new connection is accepted then it starts some "background task" so it can handle this new connection and then while loop goes back to accepting new connections without awaiting anything - pausing the lifecycle.
I cannot await HandleConnectionAsync (pause the lifecycle) here, because I want to immediately accept another connection (if there is one) and be able to handle multiple connections concurrently. HandleConnectionAsync is I/O bound and handles one connection at time until closed (task completes after some time).
The connections have to be handled separately - I don't want to have a situation when some error while handling one connection have any influence on other connections.
The "fire and forget" solution I have here works, but the general rule is to always await asynchronous methods and never use async void.
It seems like I've broken the rules, so is there a better, maybe more reliable way to handle variable (number of tasks varies in time) number of asynchronous I/O bound tasks concurrently in a situation described here?
More information:
Each call to AcceptAsync allocates system resources even before returning the connection and I want to avoid that whenever possible (the connection may not be returned for hours (code may "await" for hours) - until some external client decides to connect to my server). It is better to assume that this is the method I don't want to be called concurrently/in parallel - just one AcceptAsync at time is enough
Please take into account that I can have millions of clients per day connecting and disconnecting to my server and server (while loop) can work for many many days
I don't know how many connections I will need to handle at a specific time
I do know the maximum number of connections my program will be able to handle concurrently
If I hit the maximum number of connections limit then AcceptAsync won't return new connection until some other active connection closes, so I don't need to worry about that, but any solution based on this limit have to take into account that the active connections may be closed and I still need to handle new connections - number of connections varies over time. "fire and forget" have no issues with that
The code for HandleConnectionAsync is not relevant - it just handles one connection at time until closed (task completes after some time) and is I/O bound (HandleConnectionAsync handles one connection at time, but of course we can start multiple HandleConnectionAsync tasks to handle multiple connections concurrently - which is what I did with "fire and forget")
I'm assuming that changing to something like SignalR isn't an acceptable solution. That would be my first recommendation.
Custom server sockets is a scenario where some kind of "fire and forget" is acceptable. I'm considering adding a "task manager" kind of type to AsyncEx to make this kind of solution easier, but haven't done it yet.
The bottom line is that you need to manage your list of connections yourself. The "connection" object can include a Task that represents the handling loop; that's fine. It's also useful (especially for debugging or management purposes) to have other properties on there as well, such as the remote IP.
So I would approach it something like this:
private readonly object _mutex = new object();
private readonly List<State> _connections = new List<State>();
private void Add(State state)
{
lock (_mutex)
_connections.Add(state);
}
private void Remove(State state)
{
lock (_mutex)
_connections.Remove(state);
}
public async Task RunAsync(CancellationToken cancellationToken)
{
while (true)
{
var connection = await listener.AcceptAsync(cancellationToken);
Add(new State(this, connection));
}
}
private sealed class State
{
private readonly Parent _parent;
public State(Parent parent, Connection connection, CancellationToken cancellationToken)
{
_parent = parent;
Task = ExecuteAsync(connection, cancellationToken);
}
private static async Task ExecuteAsync(Connection connection, CancellationToken cancellationToken)
{
try { await HandleConnectionAsync(connection, cancellationToken); }
finally { _parent.Remove(this); }
}
public Task Task { get; }
// other properties as desired, e.g., RemoteAddress
}
You now have a collection of connections. You can either ignore the tasks in the State objects (as the code above is doing), which is just like fire-and-forget. Or you can await them all at some point. E.g.:
public async Task RunAsync(CancellationToken cancellationToken)
{
try
{
while (true)
{
var connection = await listener.AcceptAsync(cancellationToken);
Add(new State(this, connection));
}
}
catch (OperationCanceledException)
{
// Wait for all connections to cancel.
// I'm not really sure why you would *want* to do this, though.
List<State> connections;
lock (_mutex) { connections = _connections.ToList(); }
await Task.WhenAll(connections.Select(x => x.Task));
}
}
Then it's easy to extend the State object so you can do things that are sometimes useful for a server app to do, e.g.:
List all remote addresses this server has connections to.
Wait until a specific connection is done.
...
Notes:
Use one pattern for cancellation. Passing the token will result in an OperationCanceledException, which is the normal cancellation pattern. The code also was formerly doing a while (!IsCancellationRequested), resulting in a successful completion on cancellation, which is not the normal cancellation pattern. So I removed that so the code is no longer using two cancellation patterns.
When working with raw sockets, in the general case, you need to be constantly reading (even when you're writing) and periodically writing (even if you have no data to send). So your HandleConnectionAsync should be starting an asynchronous reader and writer and then using Task.WhenAll.
I removed the call to HandleException because (probably) whatever it does should be handled by State.ExecuteAsync. It's not hard to add it back in if necessary.
If there is a limit to the maximum number of allowed concurrent tasks, you should use SemaphoreSlim:
int allowedConcurrent = //..
var semaphore = new SemaphoreSlim(allowedConcurrent);
var tasks = new List<Task>();
while (!cancellationToken.IsCancellationRequested)
{
Func<Task> func = async () =>
{
var connection = await listener.AcceptAsync(cancellationToken);
await HandleConnectionAsync(connection, cancellationToken);
semaphore.Release();
};
await semaphore.WaitAsync(); // Will return immediately if the number of concurrent tasks does not exceed allowed
tasks.Add(func());
}
await Task.WhenAll(tasks);
This will accumulate the tasks into a list, then Task.WhenAll can wait for them all to complete.
First things first:
Don't do async void...
Then you can implement a producer/consumer pattern for this, the below pseudocode is just to guide, you need to make sure your Consumer is a Singleton in your app
public class Data
{
public Uri Url { get; set; }
}
public class Producer
{
private Consumer _consumer = new Consumer();
public void DoStuff()
{
var data = new Data();
_consumer.Enqueue(data);
}
}
public class Consumer
{
private readonly List<Data> _toDo = new List<Data>();
private bool _stop = false;
public Consumer()
{
Task.Factory.StartNew(Loop);
}
private async Task Loop()
{
while (!_stop)
{
Data toDo = null;
lock (_toDo)
{
if (_toDo.Any())
{
toDo = _toDo.First();
_toDo.RemoveAt(0);
}
}
if (toDo != null)
{
await DoSomething(toDo);
}
Thread.Sleep(TimeSpan.FromSeconds(1));
}
}
private async Task DoSomething(Data toDo)
{
// YOUR ASYNC STUFF HERE
}
public void Enqueue(Data data)
{
lock (_toDo)
{
_toDo.Add(data);
}
}
}
So your calling method produces what you need to do the background task and the consumer performs that, that's another fire and forget.
You should consider too what happens if something goes wrong at an application level, should you store the Data in the Consumer.Enqueue() so if the app starts again can do the missing job...
Hope this helps
According to the documentation:
A dataflow block is considered completed when it is not currently processing a message and when it has guaranteed that it will not process any more messages.
This behavior is not ideal in my case. I want to be able to cancel the job at any time, but the processing of each individual action takes a long time. So when I cancel the token, the effect is not immediate. I must wait for the currently processed item to complete. I have no way to cancel the actions directly, because the API I use is not cancelable. Can I do anything to make the block ignore the currently running action, and complete instantly?
Here is an example that demonstrates my problem. The token is canceled after 500 msec, and the duration of each action is 1000 msec:
static async Task Main()
{
var cts = new CancellationTokenSource(500);
var block = new ActionBlock<int>(async x =>
{
await Task.Delay(1000);
}, new ExecutionDataflowBlockOptions() { CancellationToken = cts.Token });
block.Post(1); // I must wait for this one to complete
block.Post(2); // This one is ignored
block.Complete();
var stopwatch = Stopwatch.StartNew();
try
{
await block.Completion;
}
catch (OperationCanceledException)
{
Console.WriteLine($"Canceled after {stopwatch.ElapsedMilliseconds} msec");
}
}
Output:
Canceled after 1035 msec
The desired output would be a cancellation after ~500 msec.
Based on this excerpt from your comment...:
What I want to happen in case of a cancellation request is to ignore the currently running workitem. I don't care about it any more, so why I have to wait for it?
...and assuming you are truly OK with leaving the Task running, you can simply wrap the job you wish to call inside another Task which will constantly poll for cancellation or completion, and cancel that Task instead. Take a look at the following "proof-of-concept" code that wraps a "long-running" task inside another Task "tasked" with constantly polling the wrapped task for completion, and a CancellationToken for cancellation (completely "spur-of-the-moment" status, you will want to re-adapt it a bit of course):
public class LongRunningTaskSource
{
public Task LongRunning(int milliseconds)
{
return Task.Run(() =>
{
Console.WriteLine("Starting long running task");
Thread.Sleep(3000);
Console.WriteLine("Finished long running task");
});
}
public Task LongRunningTaskWrapper(int milliseconds, CancellationToken token)
{
Task task = LongRunning(milliseconds);
Task wrapperTask = Task.Run(() =>
{
while (true)
{
//Check for completion (you could, of course, do different things
//depending on whether it is faulted or completed).
if (!(task.Status == TaskStatus.Running))
break;
//Check for cancellation.
if (token.IsCancellationRequested)
{
Console.WriteLine("Aborting Task.");
token.ThrowIfCancellationRequested();
}
}
}, token);
return wrapperTask;
}
}
Using the following code:
static void Main()
{
LongRunningTaskSource longRunning = new LongRunningTaskSource();
CancellationTokenSource cts = new CancellationTokenSource(1500);
Task task = longRunning.LongRunningTaskWrapper(3000, cts.Token);
//Sleep long enough to let things roll on their own.
Thread.Sleep(5000);
Console.WriteLine("Ended Main");
}
...produces the following output:
Starting long running task
Aborting Task.
Exception thrown: 'System.OperationCanceledException' in mscorlib.dll
Finished long running task
Ended Main
The wrapped Task obviously completes in its own good time. If you don't have a problem with that, which is often not the case, hopefully, this should fit your needs.
As a supplementary example, running the following code (letting the wrapped Task finish before time-out):
static void Main()
{
LongRunningTaskSource longRunning = new LongRunningTaskSource();
CancellationTokenSource cts = new CancellationTokenSource(3000);
Task task = longRunning.LongRunningTaskWrapper(1500, cts.Token);
//Sleep long enough to let things roll on their own.
Thread.Sleep(5000);
Console.WriteLine("Ended Main");
}
...produces the following output:
Starting long running task
Finished long running task
Ended Main
So the task started and finished before timeout and nothing had to be cancelled. Of course nothing is blocked while waiting. As you probably already know, of course, if you know what is being used behind the scenes in the long-running code, it would be good to clean up if necessary.
Hopefully, you can adapt this example to pass something like this to your ActionBlock.
Disclaimer & Notes
I am not familiar with the TPL Dataflow library, so this is just a workaround, of course. Also, if all you have is, for example, a synchronous method call that you do not have any influence on at all, then you will obviously need two tasks. One wrapper task to wrap the synchronous call and another one to wrap the wrapper task to include continuous status polling and cancellation checks.
I try to implement long running operation handling on server without push notification.
My project methods all are Async. All methods out of the web project await with ConfigureAwait(false)
I have library referenced in my web that manages long running operation.
On my initial request I have fire-and-forget for what I think can continue longer:
// my fire and forget - the task is not awaited
longRunningOperation.RunAsync();
// add my delay
var result = await Task.WhenAny(longRunningOperation.Task, Task.Delay(LongRunningConfiguration.Instance.InitialRequestReleaseTime)).ConfigureAwait(false);
// if the task finishes the return on time, otherwise create long running handler
if (result == longRunningOperation.Task)
{
// it is OK
}
else
{
Task task = Task.Run(async () =>
{
await longRunningOperation.Task;
});
monitorTask = new ActiveMonitorTask(longRunningOperation, task)
{
Id = Guid.NewGuid()
};
_monitorStateSession.Add(monitorTask);
}
At the moment I have only one of my operations implemented to support long-running.
Myst of the time tasks for that operation goes to Done status. But from time to time they hang-out in WaitForActivation
Any suggestions how to track the problem or check what can causes it?
Regards,
Boris
With TPL we have CancellationTokenSource which provides tokens, useful to cooperatively cancellation of current task (or its start).
Question:
How long it take to propagate cancellation request to all hooked running tasks?
Is there any place, where code could look to check that: "from now" every interested Task, will find that cancellation has been requested?
Why there is need for it?
I would like to have stable unit test, to show that cancellation works in our code.
Problem details:
We have "Executor" which produces tasks, these task wrap some long running actions. Main job of executor is to limit how many concurrent actions were started. All of these tasks can be cancelled individually, and also these actions will respect CancellationToken internally.
I would like to provide unit test, which shows that when cancellation occurred while task is waiting for slot to start given action, that task will cancel itself (eventually) and does not start execution of given action.
So, idea was to prepare LimitingExecutor with single slot. Then start blocking action, which would request cancellation when unblocked. Then "enqueue" test action, which should fail when executed. With that setup, tests would call unblock and then assert that task of test action will throw TaskCanceledException when awaited.
[Test]
public void RequestPropagationTest()
{
using (var setupEvent = new ManualResetEvent(initialState: false))
using (var cancellation = new CancellationTokenSource())
using (var executor = new LimitingExecutor())
{
// System-state setup action:
var cancellingTask = executor.Do(() =>
{
setupEvent.WaitOne();
cancellation.Cancel();
}, CancellationToken.None);
// Main work action:
var actionTask = executor.Do(() =>
{
throw new InvalidOperationException(
"This action should be cancelled!");
}, cancellation.Token);
// Let's wait until this `Task` starts, so it will got opportunity
// to cancel itself, and expected later exception will not come
// from just starting that action by `Task.Run` with token:
while (actionTask.Status < TaskStatus.Running)
Thread.Sleep(millisecondsTimeout: 1);
// Let's unblock slot in Executor for the 'main work action'
// by finalizing the 'system-state setup action' which will
// finally request "global" cancellation:
setupEvent.Set();
Assert.DoesNotThrowAsync(
async () => await cancellingTask);
Assert.ThrowsAsync<TaskCanceledException>(
async () => await actionTask);
}
}
public class LimitingExecutor : IDisposable
{
private const int UpperLimit = 1;
private readonly Semaphore _semaphore
= new Semaphore(UpperLimit, UpperLimit);
public Task Do(Action work, CancellationToken token)
=> Task.Run(() =>
{
_semaphore.WaitOne();
try
{
token.ThrowIfCancellationRequested();
work();
}
finally
{
_semaphore.Release();
}
}, token);
public void Dispose()
=> _semaphore.Dispose();
}
Executable demo (via NUnit) of this problem could be found at GitHub.
However, implementation of that test sometimes fails (no expected TaskCanceledException), on my machin maybe 1 in 10 runs. Kind of "solution" to this problem is to insert Thread.Sleep right after request of cancellation. Even with sleep for 3 seconds this test sometimes fails (found after 20-ish runs), and when it passes, that long waiting is usually unnecessary (I guess). For reference, please see diff.
"Other problem", was to ensure that cancellation comes from "waiting time" and not from Task.Run, because ThreadPool could be busy (other executing tests), and it cold postpone start of second task after request of cancellation - that would render this test "falsy-green". The "easy fix by hack" was to actively wait until second task starts - its Status becomes TaskStatus.Running. Please check version under this branch and see that test without this hack will be sometimes "green" - so exampled bug could pass through it.
Your test method assumes that cancellingTask always takes the slot (enters the semaphore) in LimitingExecutor before the actionTask. Unfortunatelly, this assumption is wrong, LimitingExecutor does not guarantee this and it's just a matter of luck, which of the two task takes the slot (actually on my computer it only happens in something like 5% of runs).
To resolve this problem, you need another ManualResetEvent, that will allow main thread to wait until cancellingTask actually occupies the slot:
using (var slotTaken = new ManualResetEvent(initialState: false))
using (var setupEvent = new ManualResetEvent(initialState: false))
using (var cancellation = new CancellationTokenSource())
using (var executor = new LimitingExecutor())
{
// System-state setup action:
var cancellingTask = executor.Do(() =>
{
// This is called from inside the semaphore, so it's
// certain that this task occupies the only available slot.
slotTaken.Set();
setupEvent.WaitOne();
cancellation.Cancel();
}, CancellationToken.None);
// Wait until cancellingTask takes the slot
slotTaken.WaitOne();
// Now it's guaranteed that cancellingTask takes the slot, not the actionTask
// ...
}
.NET Framework doesn't provide API to detect task transition to the Running state, so if you don't like polling the State property + Thread.Sleep() in a loop, you'll need to modify LimitingExecutor.Do() to provide this information, probably using another ManualResetEvent, e.g.:
public Task Do(Action work, CancellationToken token, ManualResetEvent taskRunEvent = null)
=> Task.Run(() =>
{
// Optional notification to the caller that task is now running
taskRunEvent?.Set();
// ...
}, token);