async await vs TaskFactory.StartNew and WaitAll - c#

I've got a NServiceBus host that goes and downloads a whole bunch of data once a message comes through about a particular users account. One data file is about 3Mb (myob - via a webservice call) and another is about 2Mb (restful endpoint, quite fast!). To avoid waiting around for long, I've wrapped the two download calls like this:
var myobBlock = Task.Factory.StartNew(() => myobService.GetDataForUser(accountId, datablockId, CurrencyFormat.IgnoreValidator));
var account = Task.Factory.StartNew(() => accountService.DownloadMetaAccount(accountId, securityContext));
Task.WaitAll(myobBlock, account);
var myobData = myobBlock.Result;
var accountData = account.Result;
//...Process AccountData Object using myobData object
I'm wondering what the benefits are for using the new async/await patterns are here compared to the TPL-esque method I've got above. Reading Stephen Clearys notes, it seems that the above would cause the thread to sit there waiting, where as Async/Await would continue and release the thread for other work.
How would you rewrite that within the context of Async/Await and would it be beneficial? We have lots of accounts to process but its once MSMQ message per account (end of FY reporting) or per-request (ad-hoc when a customer calls up and wants their report)

The benefit of using async/await is that given a true async api (One which doesn't call sync methods over async using Task.Run and the likes, but does true async I/O work) you can avoid the allocation of any unnecessary Threads which simply waste resources only to wait on blocking I/O operations.
Lets imagine both your service methods exposed an async api, you could do the following instead of using two ThreadPool threads:
var myobBlock = myobService.GetDataForUserAsync(accountId, datablockId, CurrencyFormat.IgnoreValidator));
var account = accountService.DownloadMetaAccountAsync(accountId, securityContext));
// await till both async operations complete
await Task.WhenAll(myobBlock, account);
What will happen is that execution will yield back to the calling method until both tasks complete. When they do, continuation will resume via IOCP onto the assigned SynchronizationContext if needed.

Related

Task based vs. thread based Watchdog - but async needed

We're using watchdogs to determine whether a connected system is still alive or not.
In the previous code we used TCP directly and treated the watchdog in a separate thread. Now is a new service used that provides it's data using gRPC.
For that we tried using the async interface with tasks but a task based watchdog will fail.
I wrote a small DEMO that abstracts the code and illustrates the problem. You can switch between task based watchdog and thread based watchdog by commenting out line 18 with //.
The demo contains this code that causes the problem:
async Task gRPCSendAsync(CancellationToken cancellationToken = default) => await Task.Yield();
async Task gRPCReceiveAsync(CancellationToken cancellationToken = default) => await Task.Yield();
var start = DateTime.UtcNow;
await gRPCSendAsync(cancellationToken).ConfigureAwait(false);
await gRPCReceiveAsync(cancellationToken).ConfigureAwait(false);
var end = DateTime.UtcNow;
if ((end - start).TotalMilliseconds >= 100)
// signal failing
If this code is used in Task.Run it will signal failing if the application has a lot cpu-work to do in other tasks.
If a dedicated thread is used the watchdog works as expected and no problem is raise.
I do understand the problem: All code after await may be (if not finished already or does not contain a "real" await) queued to the thread pool. But the thread pool has other things to do so that it took too long to finish the method.
Yes the simple answer is: USE THREAD.
But using a thread limits us to only use synchronous methods. There is no way to call an async method out of a thread. I created another sample that shows that all code after first await will be queued to thread bool so that CallAsync().Wait() will not work. (Btw. that issue is much more handled here.)
We're having a lot of async code that may be used within such time critical operations.
So the question is: Is there any way to perform that that operations using tasks with async/await?
Maybe I'm completely wrong and creating an task based watchdog should be done very differently.
thoughts
I was thinking about System.Threading.Timer but the problem of async sending and async receiving will cause that problem anyways.
Here is how you could use Stephen Cleary's AsyncContext class from the Nito.AsyncEx.Context package, in order to constrain an asynchronous workflow to a dedicated thread:
await Task.Factory.StartNew(() =>
{
AsyncContext.Run(async () =>
{
await DoTheWatchdogAsync(watchdogCts.Token);
});
}, TaskCreationOptions.LongRunning);
The call to AsyncContext.Run will block until the supplied asynchronous operation is completed. All asynchronous continuations created by the DoTheWatchdogAsync will be processed internally by the AsyncContext on the current thread. In the above example the current thread is not a ThreadPool thread, because of the flag TaskCreationOptions.LongRunning used in the construction of the wrapper Task. You could confirm this by querying the property Thread.CurrentThread.IsThreadPoolThread.
If you prefer you could use a traditional Thread constructor instead of the somewhat unconventional Task.Factory.StartNew+LongRunning.

Log data into cassandra using c#

I trying to log data into Cassandra using c#. So my aim is to log as much data points as I can in 200ms.
I am trying to save time, random key and value in 200ms. Please see code for refrence. the problem how can I execute session after while loop.
Cluster cluster = Cluster.Builder()
.AddContactPoint("127.0.0.1")
.Build();
ISession session = cluster.Connect("log"); //keyspace to connect with
var ps = session.Prepare("Insert into logcassandra(nanodate, key, value) values (?,?,?)");
stopwatch.Start();
while(stop.ElapsedMilliseconds <= 200)
{
i++;
var statement = ps.Bind(nanoTime(),"key"+i,"value"+i);
session.ExecuteAsync(statement);
}
Please prefer System.Threading.Timer with a TimerCallback over Stopwatch.
EDIT: (reply to the comment)
Hi, I'm not sure what you want to achieve, but here are some general concepts about async calls and parallel execution. In .NET world the async is mainly used for Non-blocking I/O operations, which means your caller thread will not wait for the response of the I/O driver. In other words, you instantiate an I/O operation and dispatch this work to a "thing" which is outside of the .NET ecosystem and that will gives you back a future (a Task). The driver acknowledges back that it received the request and it promises that it will process it once it has free capacity.
That Task represents an async work that either succeeded or fail. But because you are calling it asynchronously you are not awaiting its result (not blocking the caller thread to wait for external work) rather move on to the next statement. Eventually this operation will be finished and at that time the driver will notify that Task that a request operation has been finished. (The Task can be seen as the primary communication channel between the caller and the callee)
In your case you are using a fire and forget style async call. That means you are firing off a lot of I/O operations in async and you forget to process the result of them. You don't know either any of them failed or not. But you have called the Casandra to do a lot of staff. Your time measurement is used only for firing off jobs, which means you have no idea how much of these jobs has been finished.
If you would choose to use await against your async calls, that would mean that your while loop would be serially executed. You would firing off a job and you can't move on to the next iteration because you are awaiting it, so your caller thread will move one level higher in its call stack and examines if it can processed with something. If there is an await as well, then it moves one level higher and so on...
while(stop.ElapsedMilliseconds <= 200)
{
await session.ExecuteAsync(statement);
}
If you don't want serial execution rather parallel, you can create as many jobs as you need and await them as a whole. That's where Task.WhenAll comes into the play. You will fire off a lot of jobs and you will await that single job that will track all of other jobs.
var cassandraCalls = new List<Task>();
cassandraCalls.AddRange(Enumerable.Range(0, 100).Select(_ => session.ExecuteAsync(statement)));
await Task.WhenAll(cassandraCalls);
But this code will run until all of the jobs are finished. If you want to constrain the whole execution time then you should use some cancellation mechanism. Task.WhenAll does not support CancellationToken. But you can overcome of this limitation in several way. The simplest solution is a combination of the Task.Delay and the Task.WhenAny. Task.Delay will be used for the timeout, and Task.WhenAny will be used to await either the your cassandra calls or the timeout to complete.
var cassandraCalls = new List<Task>();
cassandraCalls.AddRange(Enumerable.Range(0, 100).Select(_ => ExecuteAsync()));
await Task.WhenAny(Task.WhenAll(cassandraCalls), Task.Delay(1000));
In this way, you have fired off as many jobs as you wanted and depending on your driver they may be executed in parallel or concurrently. You are awaiting either to finish all or elapse a certain amount of time. When the WhenAny job finishes then you can examine the result of the jobs, but simply iterating over the cassandraCalls
foreach (var call in cassandraCalls)
{
Console.WriteLine(call.IsCompleted);
}
I hope this explanation helped you a bit.

Using Task.Yield to overcome ThreadPool starvation while implementing producer/consumer pattern

Answering the question: Task.Yield - real usages?
I proposed to use Task.Yield allowing a pool thread to be reused by other tasks. In such pattern:
CancellationTokenSource cts;
void Start()
{
cts = new CancellationTokenSource();
// run async operation
var task = Task.Run(() => SomeWork(cts.Token), cts.Token);
// wait for completion
// after the completion handle the result/ cancellation/ errors
}
async Task<int> SomeWork(CancellationToken cancellationToken)
{
int result = 0;
bool loopAgain = true;
while (loopAgain)
{
// do something ... means a substantial work or a micro batch here - not processing a single byte
loopAgain = /* check for loop end && */ cancellationToken.IsCancellationRequested;
if (loopAgain) {
// reschedule the task to the threadpool and free this thread for other waiting tasks
await Task.Yield();
}
}
cancellationToken.ThrowIfCancellationRequested();
return result;
}
void Cancel()
{
// request cancelation
cts.Cancel();
}
But one user wrote
I don't think using Task.Yield to overcome ThreadPool starvation while
implementing producer/consumer pattern is a good idea. I suggest you
ask a separate question if you want to go into details as to why.
Anybody knows, why is not a good idea?
There are some good points left in the comments to your question. Being the user you quoted, I'd just like to sum it up: use the right tool for the job.
Using ThreadPool doesn't feel like the right tool for executing multiple continuous CPU-bound tasks, even if you try to organize some cooperative execution by turning them into state machines which yield CPU time to each other with await Task.Yield(). Thread switching is rather expensive; by doing await Task.Yield() on a tight loop you add a significant overhead. Besides, you should never take over the whole ThreadPool, as the .NET framework (and the underlying OS process) may need it for other things. On a related note, TPL even has the TaskCreationOptions.LongRunning option that requests to not run the task on a ThreadPool thread (rather, it creates a normal thread with new Thread() behind the scene).
That said, using a custom TaskScheduler with limited parallelism on some dedicated, out-of-pool threads with thread affinity for individual long-running tasks might be a different thing. At least, await continuations would be posted on the same thread, which should help reducing the switching overhead. This reminds me of a different problem I was trying to solve a while ago with ThreadAffinityTaskScheduler.
Still, depending on a particular scenario, it's usually better to use an existing well-established and tested tool. To name a few: Parallel Class, TPL Dataflow, System.Threading.Channels, Reactive Extensions.
There is also a whole range of existing industrial-strength solutions to deal with Publish-Subscribe pattern (RabbitMQ, PubNub, Redis, Azure Service Bus, Firebase Cloud Messaging (FCM), Amazon Simple Queue Service (SQS) etc).
After a bit of debating on the issue with other users - who are worried about the context switching and its influence on the performance.
I see what they are worried about.
But I meant: do something ... inside the loop to be a substantial task - usually in the form of a message handler which reads a message from the queue and processes it. The message handlers are usually user defined and the message bus executes them using some sort of dispatcher. The user can implement a handler which executes synchronously (nobody knows what the user will do), and without Task.Yield that will block the thread to process those synchronous tasks in a loop.
Not to be empty worded i added tests to github: https://github.com/BBGONE/TestThreadAffinity
They compare the ThreadAffinityTaskScheduler, .NET ThreadScheduler with BlockingCollection and .NET ThreadScheduler with Threading.Channels.
The tests show that for Ultra Short jobs the performance degradation is
around 15%. To use the Task.Yield without the performance degradation (even small) - it is not to use extremely short tasks and if the task is too short then combine shorter tasks into a bigger batch.
[The price of context switch] = [context switch duration] / ([job duration]+[context switch duration]).
In that case the influence of the switching the tasks is negligible on the performance. But it adds a better task cooperation and responsiveness of the system.
For long running tasks it is better to use a custom Scheduler which executes tasks on its own dedicated thread pool - (like the WorkStealingTaskScheduler).
For the mixed jobs - which can contain different parts - short running CPU bound, asynchronous and long running code parts. It is better to split the task into subtasks.
private async Task HandleLongRunMessage(TestMessage message, CancellationToken token = default(CancellationToken))
{
// SHORT SYNCHRONOUS TASK - execute as is on the default thread (from thread pool)
CPU_TASK(message, 50);
// IO BOUND ASYNCH TASK - used as is
await Task.Delay(50);
// BUT WRAP the LONG SYNCHRONOUS TASK inside the Task
// which is scheduled on the custom thread pool
// (to save threadpool threads)
await Task.Factory.StartNew(() => {
CPU_TASK(message, 100000);
}, token, TaskCreationOptions.DenyChildAttach, _workStealingTaskScheduler);
}

Background work in WCF Rest Service

I have a WCF Rest service that exposes a web method which should start off a long running process and then immediately return an id representing the task that can be used to track the status of the task.
[WebGet]
public Task<Guid> LongRunningProcess()
{
var taskId = new Guid();
var task = Task.Factory.StartNew(() =>
{
//Perform long running task
}
task.ContinueWith(task =>
{
//Send a notification to the client that the task has completed.
}
return taskId;
}
My question is that, is this the correct way to do it? or is there a better and more lightweight approach?
My understanding is that if your work is CPU bound, you are better off executing the work synchronously. With your approach the request will get parked and the original request thread will get freed up, but then are handing off the work to another thread, and the request doesn't complete until that thread is finished. You might as well do the work in the original thread.
If you had some IO in there it would make sense to make that asynchronous as asynchronous IO does not use a thread and it would free up your request thread to handle other requests, which improves your scalability.
UPDATE
I think the approach you are taking is good, but given that you are using .NET 4.5 I'd use async-await as it results in simpler code. I would then use the asynchronous API of IO operations and await its result. Eg:
[WebGet]
public async Task<Guid> LongRunningProcess()
{
var taskId = new Guid();
// IO bound operation
var dbResult = await readFromDbAsync();
// IO bound operation
var dbResult = await readFromDbAsync();
// CPU bound?
generateReport(dbResult);
// IO bound operation
await sendNotification();
return taskId;
}
If you are not familiar with async-await, I have written an intro to it here.
What you sketched up there is (with minor correction) a way to achieve what you want to do. The harder part is the notification of clients (we do so using SignalR hubs successfully, but the exact mechanism is up to you).
The minor correction I spoke about is that the return type of your method should be just Guid in your code above.
Some notes:
Performance-wise the TPL scales pretty well (IMO) but on a lager scale you may want to be able to distribute that long-running tasks over multiple servers etc...
For this case I'll recommend you to have a look at distributes job queues (Resque for example, .NET ports exist) which are perfect for this kind of use cases.

async Task<HttpResponseMessage> Get VS HttpResponseMessage Get

I would need your help in the following. For nearly a month, I have been reading regarding Tasks and async .
I wanted to try to implement my new acquired knowledege, in a simple wep api project. I have the following methods and both of them working as expected:
public HttpResponseMessage Get()
{
var data = _userServices.GetUsers();
return Request.CreateResponse(HttpStatusCode.OK, data);
}
public async Task<HttpResponseMessage> Get()
{
var data = _userServices.GetUsers();
return await Task<HttpResponseMessage>.Factory.StartNew(() =>
{
return Request.CreateResponse(HttpStatusCode.OK, data);
});
}
So the question. I have tried to use fiddler and see what is the difference between these two. The async one is little faster, but apart from that, what is the real benefit in implementing something like that in a web api?
As others have pointed out, the point of async on ASP.NET is that it frees up one of the ASP.NET thread pool threads. This works great for naturally-asynchronous operations such as I/O-bound operations because that's one less thread on the server (there is no thread that is "processing" the async operation, as I explain on my blog). Thus, the primary benefit of async on the server side is scalability.
However, you want to avoid Task.Run (and, even worse, Task.Factory.StartNew) on ASP.NET. I call this "fake asynchrony" because they're just doing synchronous/blocking work on a thread pool thread. They're useful in UI apps where you want to push work off the UI thread so the UI remains responsive, but they should (almost) never be used on ASP.NET or other server apps.
Using Task.Run or Task.Factory.StartNew on ASP.NET will actually decrease your scalability. They will cause some unnecessary thread switches. For longer-running operations, you could end up throwing off the ASP.NET thread pool heuristics, causing additional threads to be created and later destroyed needlessly. I explore these performance problems step-by-step in another blog post.
So, you need to think about what each action is doing, and whether any of that should be async. If it should, then that action should be async. In your case:
public HttpResponseMessage Get()
{
var data = _userServices.GetUsers();
return Request.CreateResponse(HttpStatusCode.OK, data);
}
What exactly is Request.CreateResponse doing? It's just creating response object. That's it - just a fancy new. There's no I/O going on there, and it certainly isn't something that needs to be pushed off to a background thread.
However, GetUsers is much more interesting. That sounds more like a data read, which is I/O-based. If your backend can scale (e.g., Azure SQL / Tables / etc), then you should look at making that async first, and once your service is exposing a GetUsersAsync, then this action could become async too:
public async Task<HttpResponseMessage> Get()
{
var data = await _userServices.GetUsersAsync();
return Request.CreateResponse(HttpStatusCode.OK, data);
}
Using async on your server can dramatically improve scalability as it frees up the thread serving the request to handle other requests while the async operation is in progress. For example in a synchronous IO operaton, the thread would be suspended and doing nothing until the operation completes and would not be available to serve another request.
That being said, using Task.Factory.StartNew starts another thread so you don't get the scalability benefits at all. Your original thread can be reused, but you have offloaded the work to another thread so there is no net benefit at all. in fact there is a cost of switching to another thread, but that is minimal.
Truly asynchronous operations do not start a thread and I would look to see if such an operation exists, or if one can be written for Request.CreateResponse. Then your code would be much more scalable. If not, you are better off sticking with the synchronous approach.
It makes more sense where the call is happening with major IO operations.
Yes, Async is faster because it frees up the request thread for the time that the operations is being performed. Thus, from Web server point of view, you are giving a thread back to the pool that can be used by the server for any future calls coming through.
So for e.g. when you are performing a search operation on SQL server, you might want to do async and see the performance benefit.
It is good for scalability that involves multiple servers.
So, for e.g. when the SearchRecordAsync sends its SQL to the database, it returns an incomplete task, and when the request hits the await, it returns the request thread to the thread pool. Later, when the DB operation completes, a request thread is taken from the thread pool and used to continue the request.
Even if you are not using, SQL operations, let say you want to send an email to 10 people. In this case also async makes more sense.
Async is also very handy to show the progress of long event. So user will still get the active GUI, while the task is running at the background.
To understand, please have a look at this sample.
Here I am trying to initiate task called send mail. Interim I want to update database, while the background is performing send mail task.
Once the database update has happened, it is waiting for the send mail task to be completed. However, with this approach it is quite clear that I can run task at the background and still proceed with original (main) thread.
using System;
using System.Threading;
using System.Threading.Tasks;
public class Program
{
public static void Main()
{
Console.WriteLine("Starting Send Mail Async Task");
Task task = new Task(SendMessage);
task.Start();
Console.WriteLine("Update Database");
UpdateDatabase();
while (true)
{
// dummy wait for background send mail.
if (task.Status == TaskStatus.RanToCompletion)
{
break;
}
}
}
public static async void SendMessage()
{
// Calls to TaskOfTResult_MethodAsync
Task<bool> returnedTaskTResult = MailSenderAsync();
bool result = await returnedTaskTResult;
if (result)
{
UpdateDatabase();
}
Console.WriteLine("Mail Sent!");
}
private static void UpdateDatabase()
{
for (var i = 1; i < 1000; i++) ;
Console.WriteLine("Database Updated!");
}
private static async Task<bool> MailSenderAsync()
{
Console.WriteLine("Send Mail Start.");
for (var i = 1; i < 1000000000; i++) ;
return true;
}
}

Categories