I have a WCF Rest service that exposes a web method which should start off a long running process and then immediately return an id representing the task that can be used to track the status of the task.
[WebGet]
public Task<Guid> LongRunningProcess()
{
var taskId = new Guid();
var task = Task.Factory.StartNew(() =>
{
//Perform long running task
}
task.ContinueWith(task =>
{
//Send a notification to the client that the task has completed.
}
return taskId;
}
My question is that, is this the correct way to do it? or is there a better and more lightweight approach?
My understanding is that if your work is CPU bound, you are better off executing the work synchronously. With your approach the request will get parked and the original request thread will get freed up, but then are handing off the work to another thread, and the request doesn't complete until that thread is finished. You might as well do the work in the original thread.
If you had some IO in there it would make sense to make that asynchronous as asynchronous IO does not use a thread and it would free up your request thread to handle other requests, which improves your scalability.
UPDATE
I think the approach you are taking is good, but given that you are using .NET 4.5 I'd use async-await as it results in simpler code. I would then use the asynchronous API of IO operations and await its result. Eg:
[WebGet]
public async Task<Guid> LongRunningProcess()
{
var taskId = new Guid();
// IO bound operation
var dbResult = await readFromDbAsync();
// IO bound operation
var dbResult = await readFromDbAsync();
// CPU bound?
generateReport(dbResult);
// IO bound operation
await sendNotification();
return taskId;
}
If you are not familiar with async-await, I have written an intro to it here.
What you sketched up there is (with minor correction) a way to achieve what you want to do. The harder part is the notification of clients (we do so using SignalR hubs successfully, but the exact mechanism is up to you).
The minor correction I spoke about is that the return type of your method should be just Guid in your code above.
Some notes:
Performance-wise the TPL scales pretty well (IMO) but on a lager scale you may want to be able to distribute that long-running tasks over multiple servers etc...
For this case I'll recommend you to have a look at distributes job queues (Resque for example, .NET ports exist) which are perfect for this kind of use cases.
Related
I am somewhat new to parallel programming C# (When I started my project I worked through the MSDN examples for TPL) and would appreciate some input on the following example code.
It is one of several background worker tasks. This specific task pushes status messages to a log.
var uiCts = new CancellationTokenSource();
var globalMsgQueue = new ConcurrentQueue<string>();
var backgroundUiTask = new Task(
() =>
{
while (!uiCts.IsCancellationRequested)
{
while (globalMsgQueue.Count > 0)
ConsumeMsgQueue();
Thread.Sleep(backgroundUiTimeOut);
}
},
uiCts.Token);
// Somewhere else entirely
backgroundUiTask.Start();
Task.WaitAll(backgroundUiTask);
I'm asking for professional input after reading several topics like Alternatives to using Thread.Sleep for waiting, Is it always bad to use Thread.Sleep()?, When to use Task.Delay, when to use Thread.Sleep?, Continuous polling using Tasks
Which prompts me to use Task.Delay instead of Thread.Sleep as a first step and introduce TaskCreationOptions.LongRunning.
But I wonder what other caveats I might be missing? Is polling the MsgQueue.Count a code smell? Would a better version rely on an event instead?
First of all, there's no reason to use Task.Start or use the Task constructor. Tasks aren't threads, they don't run themselves. They are a promise that something will complete in the future and may or may not produce any results. Some of them will run on a threadpool thread. Use Task.Run to create and run the task in a single step when you need to.
I assume the actual problem is how to create a buffered background worker. .NET already offers classes that can do this.
ActionBlock< T >
The ActionBlock class already implements this and a lot more - it allows you to specify how big the input buffer is, how many tasks will process incoming messages concurrently, supports cancellation and asynchronous completion.
A logging block could be as simple as this :
_logBlock=new ActionBlock<string>(msg=>File.AppendAllText("myLog.txt",msg));
The ActionBlock class itself takes care of buffering the inputs, feeding new messages to the worker function when it arrives, potentially blocking senders if the buffer gets full etc. There's no need for polling.
Other code can use Post or SendAsync to send messages to the block :
_block.Post("some message");
When we are done, we can tell the block to Complete() and await for it to process any remaining messages :
_block.Complete();
await _block.Completion;
Channels
A newer, lower-level option is to use Channels. You can think of channels as a kind of asynchronous queue, although they can be used to implement complex processing pipelines. If ActionBlock was written today, it would use Channels internally.
With channels, you need to provide the "worker" task yourself. There's no need for polling though, as the ChannelReader class allows you to read messages asynchronously or even use await foreach.
The writer method could look like this :
public ChannelWriter<string> LogIt(string path,CancellationToken token=default)
{
var channel=Channel.CreateUnbounded<string>();
var writer=channel.Writer;
_=Task.Run(async ()=>{
await foreach(var msg in channel.Reader.ReadAllAsync(token))
{
File.AppendAllText(path,msg);
}
},token).ContinueWith(t=>writer.TryComplete(t.Exception);
return writer;
}
....
_logWriter=LogIt(somePath);
Other code can send messages by using WriteAsync or TryWrite, eg :
_logWriter.TryWrite(someMessage);
When we're done, we can call Complete() or TryComplete() on the writer :
_logWriter.TryComplete();
The line
.ContinueWith(t=>writer.TryComplete(t.Exception);
is needed to ensure the channel is closed even if an exception occurs or the cancellation token is signaled.
This may seem too cumbersome at first. Channels allow us to easily run initialization code or carry state from one message to the next. We could open a stream before the loop starts and use it instead of reopening the file each time we call File.AppendAllText, eg :
public ChannelWriter<string> LogIt(string path,CancellationToken token=default)
{
var channel=Channel.CreateUnbounded<string>();
var writer=channel.Writer;
_=Task.Run(async ()=>{
//***** Can't do this with an ActionBlock ****
using(var writer=File.AppendText(somePath))
{
await foreach(var msg in channel.Reader.ReadAllAsync(token))
{
writer.WriteLine(msg);
//Or
//await writer.WriteLineAsync(msg);
}
}
},token).ContinueWith(t=>writer.TryComplete(t.Exception);
return writer;
}
Definitely Task.Delay is better than Thread.Sleep, because you will not be blocking the thread on the pool, and during the wait the thread on the pool will be available to handle other tasks. Then, you don't need to make your task long-running. Long-running tasks are run in a dedicated thread, and then Task.Delay is meaningless.
Instead, I will recommend a different approach. Just use System.Threading.Timer and make your life simple. Timers are kernel objects that will run their callback on the thread pool, and you will not have to worry about delay or sleep.
The TPL Dataflow library is the preferred tool for this kind of job. It allows building efficient producer-consumer pairs quite easily, and more complex pipelines as well, while offering a complete set of configuration options. In your case using a single ActionBlock should be enough.
A simpler solution you might consider is to use a BlockingCollection. It has the advantage of not requiring the installation of any package (because it is built-in), and it's also much easier to learn. You don't have to learn more than the methods Add, CompleteAdding, and GetConsumingEnumerable. It also supports cancellation. The drawback is that it's a blocking collection, so it blocks the consumer thread while waiting for new messages to arrive, and the producer thread while waiting for available space in the internal buffer (only if you specify a boundedCapacity in the constructor).
var uiCts = new CancellationTokenSource();
var globalMsgQueue = new BlockingCollection<string>();
var backgroundUiTask = new Task(() =>
{
foreach (var item in globalMsgQueue.GetConsumingEnumerable(uiCts.Token))
{
ConsumeMsgQueueItem(item);
}
}, uiCts.Token);
The BlockingCollection uses a ConcurrentQueue internally as a buffer.
Answering the question: Task.Yield - real usages?
I proposed to use Task.Yield allowing a pool thread to be reused by other tasks. In such pattern:
CancellationTokenSource cts;
void Start()
{
cts = new CancellationTokenSource();
// run async operation
var task = Task.Run(() => SomeWork(cts.Token), cts.Token);
// wait for completion
// after the completion handle the result/ cancellation/ errors
}
async Task<int> SomeWork(CancellationToken cancellationToken)
{
int result = 0;
bool loopAgain = true;
while (loopAgain)
{
// do something ... means a substantial work or a micro batch here - not processing a single byte
loopAgain = /* check for loop end && */ cancellationToken.IsCancellationRequested;
if (loopAgain) {
// reschedule the task to the threadpool and free this thread for other waiting tasks
await Task.Yield();
}
}
cancellationToken.ThrowIfCancellationRequested();
return result;
}
void Cancel()
{
// request cancelation
cts.Cancel();
}
But one user wrote
I don't think using Task.Yield to overcome ThreadPool starvation while
implementing producer/consumer pattern is a good idea. I suggest you
ask a separate question if you want to go into details as to why.
Anybody knows, why is not a good idea?
There are some good points left in the comments to your question. Being the user you quoted, I'd just like to sum it up: use the right tool for the job.
Using ThreadPool doesn't feel like the right tool for executing multiple continuous CPU-bound tasks, even if you try to organize some cooperative execution by turning them into state machines which yield CPU time to each other with await Task.Yield(). Thread switching is rather expensive; by doing await Task.Yield() on a tight loop you add a significant overhead. Besides, you should never take over the whole ThreadPool, as the .NET framework (and the underlying OS process) may need it for other things. On a related note, TPL even has the TaskCreationOptions.LongRunning option that requests to not run the task on a ThreadPool thread (rather, it creates a normal thread with new Thread() behind the scene).
That said, using a custom TaskScheduler with limited parallelism on some dedicated, out-of-pool threads with thread affinity for individual long-running tasks might be a different thing. At least, await continuations would be posted on the same thread, which should help reducing the switching overhead. This reminds me of a different problem I was trying to solve a while ago with ThreadAffinityTaskScheduler.
Still, depending on a particular scenario, it's usually better to use an existing well-established and tested tool. To name a few: Parallel Class, TPL Dataflow, System.Threading.Channels, Reactive Extensions.
There is also a whole range of existing industrial-strength solutions to deal with Publish-Subscribe pattern (RabbitMQ, PubNub, Redis, Azure Service Bus, Firebase Cloud Messaging (FCM), Amazon Simple Queue Service (SQS) etc).
After a bit of debating on the issue with other users - who are worried about the context switching and its influence on the performance.
I see what they are worried about.
But I meant: do something ... inside the loop to be a substantial task - usually in the form of a message handler which reads a message from the queue and processes it. The message handlers are usually user defined and the message bus executes them using some sort of dispatcher. The user can implement a handler which executes synchronously (nobody knows what the user will do), and without Task.Yield that will block the thread to process those synchronous tasks in a loop.
Not to be empty worded i added tests to github: https://github.com/BBGONE/TestThreadAffinity
They compare the ThreadAffinityTaskScheduler, .NET ThreadScheduler with BlockingCollection and .NET ThreadScheduler with Threading.Channels.
The tests show that for Ultra Short jobs the performance degradation is
around 15%. To use the Task.Yield without the performance degradation (even small) - it is not to use extremely short tasks and if the task is too short then combine shorter tasks into a bigger batch.
[The price of context switch] = [context switch duration] / ([job duration]+[context switch duration]).
In that case the influence of the switching the tasks is negligible on the performance. But it adds a better task cooperation and responsiveness of the system.
For long running tasks it is better to use a custom Scheduler which executes tasks on its own dedicated thread pool - (like the WorkStealingTaskScheduler).
For the mixed jobs - which can contain different parts - short running CPU bound, asynchronous and long running code parts. It is better to split the task into subtasks.
private async Task HandleLongRunMessage(TestMessage message, CancellationToken token = default(CancellationToken))
{
// SHORT SYNCHRONOUS TASK - execute as is on the default thread (from thread pool)
CPU_TASK(message, 50);
// IO BOUND ASYNCH TASK - used as is
await Task.Delay(50);
// BUT WRAP the LONG SYNCHRONOUS TASK inside the Task
// which is scheduled on the custom thread pool
// (to save threadpool threads)
await Task.Factory.StartNew(() => {
CPU_TASK(message, 100000);
}, token, TaskCreationOptions.DenyChildAttach, _workStealingTaskScheduler);
}
I would need your help in the following. For nearly a month, I have been reading regarding Tasks and async .
I wanted to try to implement my new acquired knowledege, in a simple wep api project. I have the following methods and both of them working as expected:
public HttpResponseMessage Get()
{
var data = _userServices.GetUsers();
return Request.CreateResponse(HttpStatusCode.OK, data);
}
public async Task<HttpResponseMessage> Get()
{
var data = _userServices.GetUsers();
return await Task<HttpResponseMessage>.Factory.StartNew(() =>
{
return Request.CreateResponse(HttpStatusCode.OK, data);
});
}
So the question. I have tried to use fiddler and see what is the difference between these two. The async one is little faster, but apart from that, what is the real benefit in implementing something like that in a web api?
As others have pointed out, the point of async on ASP.NET is that it frees up one of the ASP.NET thread pool threads. This works great for naturally-asynchronous operations such as I/O-bound operations because that's one less thread on the server (there is no thread that is "processing" the async operation, as I explain on my blog). Thus, the primary benefit of async on the server side is scalability.
However, you want to avoid Task.Run (and, even worse, Task.Factory.StartNew) on ASP.NET. I call this "fake asynchrony" because they're just doing synchronous/blocking work on a thread pool thread. They're useful in UI apps where you want to push work off the UI thread so the UI remains responsive, but they should (almost) never be used on ASP.NET or other server apps.
Using Task.Run or Task.Factory.StartNew on ASP.NET will actually decrease your scalability. They will cause some unnecessary thread switches. For longer-running operations, you could end up throwing off the ASP.NET thread pool heuristics, causing additional threads to be created and later destroyed needlessly. I explore these performance problems step-by-step in another blog post.
So, you need to think about what each action is doing, and whether any of that should be async. If it should, then that action should be async. In your case:
public HttpResponseMessage Get()
{
var data = _userServices.GetUsers();
return Request.CreateResponse(HttpStatusCode.OK, data);
}
What exactly is Request.CreateResponse doing? It's just creating response object. That's it - just a fancy new. There's no I/O going on there, and it certainly isn't something that needs to be pushed off to a background thread.
However, GetUsers is much more interesting. That sounds more like a data read, which is I/O-based. If your backend can scale (e.g., Azure SQL / Tables / etc), then you should look at making that async first, and once your service is exposing a GetUsersAsync, then this action could become async too:
public async Task<HttpResponseMessage> Get()
{
var data = await _userServices.GetUsersAsync();
return Request.CreateResponse(HttpStatusCode.OK, data);
}
Using async on your server can dramatically improve scalability as it frees up the thread serving the request to handle other requests while the async operation is in progress. For example in a synchronous IO operaton, the thread would be suspended and doing nothing until the operation completes and would not be available to serve another request.
That being said, using Task.Factory.StartNew starts another thread so you don't get the scalability benefits at all. Your original thread can be reused, but you have offloaded the work to another thread so there is no net benefit at all. in fact there is a cost of switching to another thread, but that is minimal.
Truly asynchronous operations do not start a thread and I would look to see if such an operation exists, or if one can be written for Request.CreateResponse. Then your code would be much more scalable. If not, you are better off sticking with the synchronous approach.
It makes more sense where the call is happening with major IO operations.
Yes, Async is faster because it frees up the request thread for the time that the operations is being performed. Thus, from Web server point of view, you are giving a thread back to the pool that can be used by the server for any future calls coming through.
So for e.g. when you are performing a search operation on SQL server, you might want to do async and see the performance benefit.
It is good for scalability that involves multiple servers.
So, for e.g. when the SearchRecordAsync sends its SQL to the database, it returns an incomplete task, and when the request hits the await, it returns the request thread to the thread pool. Later, when the DB operation completes, a request thread is taken from the thread pool and used to continue the request.
Even if you are not using, SQL operations, let say you want to send an email to 10 people. In this case also async makes more sense.
Async is also very handy to show the progress of long event. So user will still get the active GUI, while the task is running at the background.
To understand, please have a look at this sample.
Here I am trying to initiate task called send mail. Interim I want to update database, while the background is performing send mail task.
Once the database update has happened, it is waiting for the send mail task to be completed. However, with this approach it is quite clear that I can run task at the background and still proceed with original (main) thread.
using System;
using System.Threading;
using System.Threading.Tasks;
public class Program
{
public static void Main()
{
Console.WriteLine("Starting Send Mail Async Task");
Task task = new Task(SendMessage);
task.Start();
Console.WriteLine("Update Database");
UpdateDatabase();
while (true)
{
// dummy wait for background send mail.
if (task.Status == TaskStatus.RanToCompletion)
{
break;
}
}
}
public static async void SendMessage()
{
// Calls to TaskOfTResult_MethodAsync
Task<bool> returnedTaskTResult = MailSenderAsync();
bool result = await returnedTaskTResult;
if (result)
{
UpdateDatabase();
}
Console.WriteLine("Mail Sent!");
}
private static void UpdateDatabase()
{
for (var i = 1; i < 1000; i++) ;
Console.WriteLine("Database Updated!");
}
private static async Task<bool> MailSenderAsync()
{
Console.WriteLine("Send Mail Start.");
for (var i = 1; i < 1000000000; i++) ;
return true;
}
}
I've got a NServiceBus host that goes and downloads a whole bunch of data once a message comes through about a particular users account. One data file is about 3Mb (myob - via a webservice call) and another is about 2Mb (restful endpoint, quite fast!). To avoid waiting around for long, I've wrapped the two download calls like this:
var myobBlock = Task.Factory.StartNew(() => myobService.GetDataForUser(accountId, datablockId, CurrencyFormat.IgnoreValidator));
var account = Task.Factory.StartNew(() => accountService.DownloadMetaAccount(accountId, securityContext));
Task.WaitAll(myobBlock, account);
var myobData = myobBlock.Result;
var accountData = account.Result;
//...Process AccountData Object using myobData object
I'm wondering what the benefits are for using the new async/await patterns are here compared to the TPL-esque method I've got above. Reading Stephen Clearys notes, it seems that the above would cause the thread to sit there waiting, where as Async/Await would continue and release the thread for other work.
How would you rewrite that within the context of Async/Await and would it be beneficial? We have lots of accounts to process but its once MSMQ message per account (end of FY reporting) or per-request (ad-hoc when a customer calls up and wants their report)
The benefit of using async/await is that given a true async api (One which doesn't call sync methods over async using Task.Run and the likes, but does true async I/O work) you can avoid the allocation of any unnecessary Threads which simply waste resources only to wait on blocking I/O operations.
Lets imagine both your service methods exposed an async api, you could do the following instead of using two ThreadPool threads:
var myobBlock = myobService.GetDataForUserAsync(accountId, datablockId, CurrencyFormat.IgnoreValidator));
var account = accountService.DownloadMetaAccountAsync(accountId, securityContext));
// await till both async operations complete
await Task.WhenAll(myobBlock, account);
What will happen is that execution will yield back to the calling method until both tasks complete. When they do, continuation will resume via IOCP onto the assigned SynchronizationContext if needed.
I'm making a port of the AKKA framework for .NET (don't take this too serious now, it is a weekend hack of the Actor part of it right now)
I'm having some problems with the "Future" support in it.
In Java/Scala Akka, Futures are to be awaited synchronously with an Await call.
Much like the .NET Task.Wait()
My goal is to support true async await for this.
It works right now, but the continuation is executed on the wrong thread in my current solution.
This is the result when passing a message to one of my actors that contain an await block for a future.
As you can see, the actor always executes on the same thread, while the await block executes on a random threadpool thread.
actor thread: 6
await thread 10
actor thread: 6
await thread 12
actor thread: 6
actor thread: 6
await thread 13
...
The actor gets a message using a DataFlow BufferBlock<Message>
Or rather, I use RX over the bufferblock to subscribe to messages.
It is configured like this:
var messages = new BufferBlock<Message>()
{
BoundedCapacity = 100,
TaskScheduler = TaskScheduler.Default,
};
messages.AsObservable().Subscribe(this);
So far so good.
However, when I await on a future result.
like so:
protected override void OnReceive(IMessage message)
{
....
var result = await Ask(logger, m);
// This is not executed on the same thread as the above code
result.Match()
.With<SomeMessage>(t => {
Console.WriteLine("await thread {0}",
System.Threading.Thread.CurrentThread.GetHashCode());
})
.Default(_ => Console.WriteLine("Unknown message"));
...
I know this is normal behavior of async await, but I really must ensure that only one thread has access to my actor.
I don't want the future to run synchronously, I want to to run async just like normal, but I want the continuation to run on the same thread as the message processor/actor does.
My code for the future support looks like this:
public Task<IMessage> Ask(ActorRef actor, IMessage message)
{
TaskCompletionSource<IMessage> result =
new TaskCompletionSource<IMessage>();
var future = Context.ActorOf<FutureActor>(name : Guid.NewGuid().ToString());
// once this object gets a response,
// we set the result for the task completion source
var futureActorRef = new FutureActorRef(result);
future.Tell(new SetRespondTo(), futureActorRef);
actor.Tell(message, future);
return result.Task;
}
Any ideas what I can do to force the continuation to run on the same thread that started the above code?
I'm making a port of the AKKA framework for .NET
Sweet. I went to an Akka talk at CodeMash '13 despite having never touched Java/Scala/Akka. I saw a lot of potential there for a .NET library/framework. Microsoft is working on something similar, which I hope will eventually be made generally available (it's currently in a limited preview).
I suspect that staying in the Dataflow/Rx world as much as possible is the easier approach; async is best when you have asynchronous operations (with a single start and single result for each operation), while Dataflow and Rx work better with streams and subscriptions (with a single start and multiple results). So my first gut reaction is to either link the buffer block to an ActionBlock with a specific scheduler, or use ObserveOn to move the Rx notifications to a specific scheduler, instead of trying to do it on the async side. Of course I'm not really familiar with the Akka API design, so take that with a grain of salt.
Anyway, my async intro describes the only two reliable options for scheduling await continuations: SynchronizationContext.Current and TaskScheduler.Current. If your Akka port is more of a framework (where your code does the hosting, and end-user code is always executed by your code), then a SynchronizationContext may make sense. If your port is more of a library (where end-user code does the hosting and calls your code as necessary), then a TaskScheduler would make more sense.
There aren't many examples of a custom SynchronizationContext, because that's pretty rare. I do have an AsyncContextThread type in my AsyncEx library which defines both a SynchronizationContext and a TaskScheduler for that thread. There are several examples of custom TaskSchedulers, such as the Parallel Extensions Extras which has an STA scheduler and a "current thread" scheduler.
Task scheduler decides whether to run a task on a new thread or on the current thread.
There is an option to force running it on a new thread, but none forcing it to run on the current thread.
But there is a method Task.RunSynchronously() which Runs the Task synchronously on the current TaskScheduler.
Also if you are using async/await there is already a similar question on that.