I have the following WebAPI 2 method:
public HttpResponseMessage ProcessData([FromBody]ProcessDataRequestModel model)
{
var response = new JsonResponse();
if (model != null)
{
// checks if there are old records to process
var records = _utilityRepo.GetOldProcesses(model.ProcessUid);
if (records.Count > 0)
{
// there is an active process
// insert the new process
_utilityRepo.InsertNewProcess(records[0].ProcessUid);
response.message = "Process added to ProcessUid: " + records[0].ProcessUid.ToString();
}
else
{
// if this is a new process then do adjustments rules
var settings = _utilityRepo.GetSettings(model.Uid);
// create a new process
var newUid = Guid.NewGuid();
// if its a new adjustment
if (records.AdjustmentUid == null)
{
records.AdjustmentUid = Guid.NewGuid();
// create new Adjustment information
_utilityRepo.CreateNewAdjustment(records.AdjustmentUid.Value);
}
// if adjustment created
if (_utilityRepo.CreateNewProcess(newUid))
{
// insert the new body
_utilityRepo.InsertNewBody(newUid, model.Body, true);
}
// start AWS lambda function timer
_utilityRepo.AWSStartTimer();
response.message = "Process created";
}
response.success = true;
response.data = null;
}
return Request.CreateResponse(response);
}
The above method sometimes can take from 3-4 seconds to process (some db calls and other calculations) and I don't want the user to wait until all the executions are done.
I would like the user hit the web api method and almost inmediatly get a success response, meanwhile the server is finishing all the executions.
Any clue on how to implement Async / Await to achieve this?
If you don't need to return a meaningful response it's a piece of cake. Wrap your method body in a lambda you pass to Task.Run (which returns a Task). No need to use await or async. You just don't await the Task and the endpoint will return immediately.
However if you need to return a response that depends on the outcome of the operation, you'll need some kind of reporting mechanism in place, SignalR for example.
Edit: Based on the comments to the original post, my recommendation would be to wrap the code in await Task.Run(()=>...), i.e., indeed await it before returning. That will allow the long-ish process to run on a different thread asynchronously, but the response will still await the outcome rather than leaving the user in the dark about whether it finished (since you have no control over the UI). You'd have to test it though to see if there's really any performance benefit from doing this. I'm skeptical it'll make much difference.
2020-02-14 Edit:
Hooray, my answer's votes are no longer in the negative! I figured having had the benefit of two more years of experience I would share some new observations on this topic.
There's no question that asynchronous background operations running in a web server is a complex topic. But as with most things, there's a naive way of doing it, a "good enough for 99% of cases" way of doing it, and a "people will die (or worse, get sued) if we do it wrong" way of doing it. Things need to be put in perspective.
My original answer may have been a little naive, but to be fair the OP was talking about an API that was only taking a few seconds to finish, and all he wanted to do was save the user from having to wait for it to return. I also noted that the user would not get any report of progress or completion if it is done this way. If it were me, I'd say the user should suck it up for that short of a time. Alternatively, there's nothing that says the client has to wait for the API response before returning control to the user.
But regardless, if you really want to get that 200 right away JUST to acknowledge that the task was initiated successfully, then I still maintain that a simple Task.Run(()=>...) without the await is probably fine in this case. Unless there are truly severe consequences to the user not knowing the API failed, on the off chance that the app pool was recycled or the server restarted during those exact 4 seconds between the API return and its true completion, the user will just be ignorant of the failure and will presumably find out next time they go into the application. Just make sure that your DB operations are transactional so you don't end up in a partial success situation.
Then there's the "good enough for 99% of cases" way, which is what I do in my application. I have a "Job" system which is asynchronous, but not reentrant. When a job is initiated, we do a Task.Run and begin to execute it. The code in the task always holds onto a Job data structure whose ID is returned immediately by the API. The code in the task periodically updates the Job data with status, which is also saved to a database, and checks to see if the Job was cancelled by the user, in which case it wraps up immediately and the DB transaction is rolled back. The user cancels by calling another API which updates said Job object in the database to indicate it should be cancelled. A separate infinite loop periodically polls the job database server side and updates the in-memory Job objects used by the actual running code with any cancellation requests. Fundamentally it's just like any CancellationToken in .NET but it just works via a database and API calls. The front end can periodically poll the server for job status using the ID, or better yet, if they have WebSockets the server pushes job updates using SignalR.
So, what happens if the app domain is lost during the job? Well, first off, every job runs in a single DB transaction, so if it doesn't complete the DB rolls back. Second, when the ASP.NET app restarts, one of the first things it does is check for any jobs that are still marked as running in the DB. These are the zombies that died upon app pool restart but the DB still thinks they're alive. So we mark them as KIA, and send the user an email indicating their job failed and needs to be rerun. Sometimes it causes inconvenience and a puzzled user from time to time, but it works fine 99% of the time. Theoretically, we could even automatically restart the job on server startup if we wanted to, but we feel it's better to make that a manual process for a number of case-specific reasons.
Finally, there's the "people will die (or worse, get sued) if we get it wrong" way. This is what some of the other comments are more directed to. This is where have to break down all jobs into small atomic transactions that are tracked in a database at every step, and which can be picked up by any server (the same or maybe another server in a farm) at any time. If it's really top notch, multiple servers can even work on the same job concurrently, depending on what it is. It requires carefully coding every background operation with this in mind, constantly updating a database with your progress, dealing with concurrent changes to the database (because now the entire operation is no longer a single atomic transaction), etc. Needless to say, it's a LOT of effort. Yeah, it would be great if it worked this way. It would be great if every app did everything to this level of perfection. I also want a toilet made out of solid gold, but it's just not in the cards now is it?
So my $0.02 is, again, let's have some perspective. Do the cost benefit analysis and unless you're doing something where lives or lots of money is at stake, aim for what works perfectly well 99%+ of the time and only causes minor inconvenience when it doesn't work perfectly.
Related
It's my understanding that controllers get destroyed after an http request is made. Is there any assurances that the .NET Core runtime will wait until all threads initiated in an async action have terminated/ended before destroying the controller instance?
I have code below with an async controller action that calls an async function. I don't need to know if the async function actually succeeds or not (e.g. sending the email), I just want to make sure that it attempts to. My fear is that the .NET Core runtime will possibly kill the thread in the middle of execution.
Spoiler alert I ran the code below in my development environment and it does send the email every time (I put a real email). But I don't know if the behavior would change in a production environment.
Any thoughts/guidance?
[HttpGet]
public async Task SendEmail()
{
// If I would prefix this with 'await' the controller
// action doesn't terminate until the async function returns
this.InternalSendEmail();
}
private async Task InternalSendEmail()
{
try
{
await this.Email.Send("to#example.com", "Interesting subject", "Captivating content");
}
catch (Exception exc)
{
Log(exc);
}
}
What happens to the controller instance - nothing you can't manage
First, when we talk about destroying the controller instance let's be more precise. The instance won't get GCd as long as there's still a control flow that has access to this. It can't. So your controller instance will be fine in that regard at least until your private method finishes.
What will happen is your controller method will return and control flow will go to the next stage in the middleware chain, meaning your API consumer will likely get the http response before the email is sent. You will lose your HttpContext and all that goes with it when this happens. Thus if there's anything in your Log method or anything else in InternalSendEmail that relies on the HttpContext you need to make sure that information is extracted and provided to the background method before the controller method returns.
What happens to the thread - almost certainly nothing
As far as the thread goes, most likely the email will be sent on a different thread in the thread pool from that of the original controller method, but either way, no the .NET runtime isn't going to care about your controller method returning before every task it fired off has completed, let alone kill the thread. That's well above its paygrade. Moreover it's very rare for threads to be killed in any instance these days because it's not just your control flow that's affected but completely unrelated async contexts could be dependent on that thread too.
IIS Application Pool recycling and other things that COULD potentially kill your background task
The only reasonably likely thing that would cause your background task not to complete would be if the process terminated. This happens for example during an IIS Application Pool reset (or equivalent if you're using dotnet hosting), obviously a server restart, etc. It can also happen if there's a catastrophic event like running out of memory, or nasty things like memory faults unique to unsafe or native code. But these things would kill all pending HTTP requests too.
I have seen anecdotal assertions that if there are no pending HTTP requests it makes it more likely that IIS will recycle the application pool on its own even if you have other active code running. After many years of maintaining an application that uses a very similar pattern for many non-critical long-running tasks, I have not seen this happen in practice (and we log every application start to a local TXT file so we would know if this were happening). So I am personally skeptical of this, though I welcome someone providing an authoritative source proving me wrong.
That said, we do set the application pool to reset every day at 4 AM, so to the extent that IIS would be inclined to involuntarily reset our app pools (as it does need to happen every now and then) I suspect this helps mitigate that, and would recommend it regardless. We also allow only one CPU process per application, rather than allowing IIS to fire off processes whenever it feels like it; I suspect this also makes it less likely IIS would kill the process involuntarily.
In sum - this is perfectly fine for non-critical tasks
I would not use this for critical tasks where unambiguous acknowledgement of success or failure is needed, such as in life critical applications. But for 99+% of real world applications what you're doing is perfectly fine as long as you account for the things discussed above and have some reasonable level of fault tolerance and failsafes in place, which the fact that you're logging the exception shows you clearly do.
PS - If you're interested in having robust progress reporting and you aren't familiar with it, I would look into SignalR, which would allow you to notify the client of a successful email send (or anything else) even after the API call returns, and is shockingly easy to use. Plus an open websocket connection would almost certainly prevent IIS from mistaking a returned API method for an opportunity to kill the process.
Is there any assurances that the .NET Core runtime will wait until all threads initiated in an async action have terminated/ended before destroying the controller instance?
No, this is absolutely not guaranteed.
I don't need to know if the async function actually succeeds or not (e.g. sending the email), I just want to make sure that it attempts to. My fear is that the .NET Core runtime will possibly kill the thread in the middle of execution.
You cannot be sure that it will attempt to do so. The thread (and entire process) may be terminated at any time after the HTTP response is sent. In general, request-extrinsic code is not guaranteed to complete.
Some people are fine with losing some work (e.g., in this case, missing some emails). I'm not, so my systems are all built on a basic distributed architecture, as described on my blog.
It's important to note that work can be lost during any shutdown, and shutdowns are normal:
Any rolling upgrade triggers shutdowns (i.e., application updates).
IIS/ASP.NET recycles every 19 hours by default.
Runtime and OS patches require shutdowns.
Cloud hosting causes shutdowns (both at the VM level and higher levels).
Bottom line: shutdowns are normal, and shutdowns cause any currently-running request-extrinsic work to be lost. If you're fine with occasionally losing work, then OK; but if you require an assurance that the work is done, then you'll need a basic distributed architecture (a durable queue with a background processor).
There are more basic control flow issues with that logic what you trying to do. Your biggest problem is not the garantee about it is finished or not.
The example you present is very simple, but in real life example you will need some context in InternalSendEmail when it is executed. Because the request is completely served at the time it is executed, there will not be HttpContext, with all the consequences, for example you can not even log the IP address of the the request, not talking about all the more advanced context bound things like the user (or any other security principal) etc.
Of course you can pass anything as parameter (for example the IP address) but probably your logging infra (or your custom log enricher) will not work with that. Same is true for any other pipeline component which depends on the context.
I have a list of API from different client saved in my Database table and all the API have different time interval for there API to be called. What should be my approach to call the API. New data may be added in the List of API table . Should I go for Dynamic Timers?
I have an application (GUI) which clients use to add new records.
These records represent an API url and the time (Schedule) at which that API should be called.
Your Challenge is to write code that is able to call all the Client specified API's at the specified schedule/time.
To me - API calling & Handling the responses (Storing into DB etc) should be one component. and, scheduling when to call which API - should be other component (Something like cron job). This way - when the time is right appropriate API call would be triggered. This also gives you flexibility to do multiple tries/retries in a day etc.
Update after your comment:
You have an application (GUI) which clients use to add new records.
These records represent an API url and the time (Schedule) at which that API should be called.
Your Challenge is to write code that is able to call all the Client specified API's at the specified schedule/time.
If I have got that problem right - my original suggestion stands.
Component 1 - Scheduler
Use Quartz.net (or create your own using a Timer etc) - and create a Service (say WCF) or Process which will read records from Database and identify all the schedules and the API urls that need to be called. When the scheduled time happens Quartz.net will trigger your handler method - where you will make a Call to Component 2 and pass on the API url.
Component 2 - API Engine
When it receives a call from Component 1 - it will make the API call and fetch the response. Store/process it as required.
There are various schedulers that can be used to do this automatically. For example, you could use Quartz.NET and its AdoJobStore. I haven't used that myself, but it sounds appropriate:
With the use of the included AdoJobStore, all Jobs and Triggers configured as "non-volatile" are stored in a relational database via ADO.NET.
Alternatively, your database may well have timers built into it. However, if this is primarily an academic exercise (as suggested by "your challenge") you may not be able to use these.
I would keep a table of scheduled tasks, with columns specifying:
When the task should next be run
What the task should do
How to work out the next iteration of that task afterwards
If the task has been started, when it was started
If the task completed, when it completed
You can then write code in an infinite loop to just scan that table, e.g. once per minute. It should look for all tasks with a "next time" earlier than now that haven't completed:
If the task hasn't been started, update the row to show that it has been started (now), and start executing the task
If the task was started recently, ignore it
If the task was started "a long time ago" (i.e. longer than it would take to run successfully), either mark it as "broken" somehow, or restart
When a task completes successfully, update the row to indicate that it's finished, and add another row for the next time it should be started.
You'll need to work out exactly what your error strategy is:
How long should the gap be between a task starting and you deciding it's failed?
Do you always want to restart the task, or should some failures be permanent?
Do you need to record how often a task failed, and give up after a certain number of tries?
What do you do if you explicitly notice that the task has failed while you're executing it? (Rather than just by the fact that it was started a long time ago.)
For extra reliability, you'd need to think of other aspects too:
Do you need multiple task runners?
How can you spot when a task runner has failed, and restart that?
How do you deal with multiple task runners trying to start the same task at the same time?
You may not need to actually implement everything here, but it's worth considering them.
Here is my problem, I got a WCF project, which doesnt really matter in fact because it's more about C#/.NET I believe. In my WCF Service when client is requestinq one of the methods I make the validation of the input, and if it succeeds I start some business logic calculactions. I want to start this logic in another thread/task so after the input validation I can immediately return response. Its something like this:
XXXX MyMethod(MyArgument arg)
{
var validation = _validator.Validate(arg);
if (validation.Succeed)
{
Task.Run(() => businessLogic())
}
return MyResponseModel();
}
I need to make it like this because my buesinessLogic can take long time calculactions and database saves in the end, but client requesting the Service have to know immediately if the model is correct.
In my businessLogic calculactions/saves that will be running in background thread I have to catch exceptions if something fail and save it in database. (its pretty big logic so many exceptions can be thrown, like for example after calculactions im persisting the object in the database so save error can be thrown if database is offline for example)
How to correctly implement/what to use for such a requirements? I am just giving consideration if using Task.Run and invoking all the logic in the action event is a good practice?
You can do it like this.
Be aware, though, that worker processes can exit at any time. In that case outstanding work will simply be lost. Maybe you should queue the work to a message queue instead.
Also, if the task "crashes" you will not be notified in any way. Implement your own error logging.
Also, there is no limit to the number of tasks that you can spawn like this. If processing is too slow more and more work will queue up. This might not at all be a problem if you know that the server will not be overloaded.
It was suggested that Task.Run will use threads and therefore not scale. This is not necessarily so. Usually, the bottleneck of any processing is not the number of threads but the backend resources being used (database, disk, services, ...). Even using hundreds of threads is not in any way likely to be a bottleneck. Async IO is not a way around backend resource constraints.
I came a across a nice little tool that has been added to ASP.NET in v4.5.2
I am wandering how safe it is and how one can effectively utilize it in an ASP.NET MVC or Web API scenario.
I know I am always wanting to do a quick and simple fire and forget task in my web applications. For example:
Sending an emails/s
Sending push notifications
Logging analytics or errors to the db
Now typically I just create a method called
public async Task SendEmailAsync(string to, string body)
{
//TODO: send email
}
and I would use it like so:
public async Task<ActionResult> Index()
{
...
await SendEmailAsync(User.Identity.Username, "Hello");
return View();
}
now my concern with this is that, I am delaying the user in order to send my email to them. This doesn't make much sense to me.
So I first considered just doing:
Task.Run(()=> SendEmailAsync(User.Identity.Username, "Hello"));
however when reading up about this. It is apparently not the best thing to do in IIS environment. (i'm not 100% sure on the specifics).
So this is where I came across HostingEnvironment.QueueBackgroundWorkItem(x=> SendEmailAsync(User.Identity.Username, "Hello"));
This is a very quick and easy way to offload the send email task to a background worker and serve up the users View() much quicker.
Now I am aware this is not for tasks running longer than 90 seconds and is not 100% guaranteed executution.
But my question is:
Is HostingEnvironment.QueueBackgroundWorkItem() sufficient for: sending emails, push notifications, db queries etc in a standard ASP.NET web site.
It depends.
The main benefit of QueueBackgroundWorkItem is the following, emphasis mine (source):
Differs from a normal ThreadPool work item in that ASP.NET can keep track of how many work items registered through this API are currently running, and the ASP.NET runtime will try to delay AppDomain shutdown until these work items have finished executing.
Essentially, QueueBackgroundWorkItem helps you run tasks that might take a couple of seconds by attempting not to shutdown your application while there's still a task running.
Running a normal database query or sending out a push notification should be a matter of a couple hundred milliseconds (or a few seconds); neither should take a very long time and should thus be fine to run within QueueBackgroundWorkItem.
However, there's no guarantee for the task to finish — as you said, the task is not awaited. It all depends on the importance of the task to execute. If the task must complete, it's not a good candidate for QueueBackgroundWorkItem.
This question already has answers here:
When should I use Async Controllers in ASP.NET MVC?
(8 answers)
Closed 6 years ago.
I am working with a pre-existing C# ASP.NET MVC webapplication and I'm adding some functionality to it that I can't decide whether or not to make async.
Right now, the application home page just processes a page and a user logs in. Nothing more, and nothing asynchronous going on at all.
I am adding functionality that will, when the homepage is visited, generate a call to a Web API that subsequently calls a database that grabs an identifier and returns it to an HTML tag on the home page. This identifier will not be visible on the screen, only on the source/HTML view (this is being added for various internal tracking purposes).
The Web API/database call is simple, just grab an identifier and return it to the controller. Regardless, I'm wondering whether the app should make this call asynchronously? The website traffic isn't immense, but I'm still wondering about concurrency, performance and future scalability.
The one catch is that I'd have to make the entire ActionMethod async and I'm not sure what the affects of that would be. The basic pattern, currently synchronous, is below:
public ActionResult Index()
{
var result = GetID();
ViewBag.result = result.Data;
return View();
}
public JsonResult GetID()
{
var result = "";
var url = "http://APIURL/GetID";
using (WebClient client = new WebClient())
{
result = client.DownloadString(url);
}
return Json(result, JsonRequestBehavior.AllowGet);
}
Any thoughts?
First and foremost, realize the purpose of async, in the context of a web application. A web server has what's called a thread pool. Generally speaking, 1 thread == 1 request, so if you have a 1000 threads in the pool (typical), your website can roughly serve 1000 simultaneous requests. Also keep in mind that, it often takes many requests to render a single resource. The HTML document itself is one request, but each image, JS file, CSS file, etc. is also a request. Then, there's any AJAX requests the page may issue. In other words, it's not uncommon for a request for a single resource to generate 20+ requests to the web server.
Given that, when your server hits its max requests (all threads are being utilized), any further requests are queued and processed in order as threads are made available. What async does is buy you some additional head room. If there's threads that are in a wait-state (waiting for the results of a database query, the response from a web service, a file to be read from the filesystem, etc.), then async allows these threads to be returned to the pool, where they are then able to field some of those waiting requests. When whatever the thread was waiting on completes, a new thread is requested to finish servicing the request.
What is important to note here is that a new thread is requested to finish servicing the request. Once the thread has been released to the pool, you have to wait for a thread again, just like a brand new request. This means running async can sometimes take longer than running sync, depending on the availability of threads in the pool. Also, async caries with it a non-insignificant amount of overhead that also adds to the overall load time.
Async != faster. It can many times be slower, but it allows your web server to more efficiently utilize resources, which could mean the difference between falling down and gracefully bearing load. Because of this, there's no one universal answer to a question like "Should I just make everything async?" Async is a trade-off between raw performance and efficiency. In some situations it may not make sense to use async at all, while in others you might want to use it for everything that's applicable. What you need to do is first identity the stress points of your application. For example, if your database instance resides on the same server as your web server (not a good idea, BTW), using async on your database queries would be fairly pointless. The chief culprit of waiting is network latency, whereas filesystem access is typically relatively quick. On the other hand, if your database server is in a remote datacenter and has to not only travel the pipes between there and your web server but also do things like traverse firewalls, well, then your network latency is much more significant, and async is probably a very good idea.
Long and short, you need to evaluate your setup, your environment and the needs of your application. Then, and only then, can you make smart decisions about this. That said, even given the overhead of async, if there's network latency involved at all, it's a pretty safe bet async should be used. It's perfectly acceptable to err on the site of caution and just use async everywhere it's applicable, and many do just that. If you're looking to optimize for performance though (perhaps you're starting the next Facebook?), then you'd want to be much more judicious.
Here, the reason to use async IO is to not have many threads running at the same time. Threads consume OS resources and memory. The thread pool also cal be a little slow to adjust to sudden load. If your thread count in a web app is below 100 and load is not extremely spikey you have nothing to worry about.
Generally, the slower a web service and the more often it is called the more beneficial async IO can be. You will need on average (latency * frequency) threads running. So 100ms call time and 10 calls per second is about 1 thread on average.
Run the numbers and see if you need to change anything or not.
Any thoughts?
Yes, lot's of thoughts...but that alone doesn't count as an answer. ;)
There is no real good answer here since there isn't much context provided. But let's address what we know.
Since we are a web application, each request/response cycle has a direct impact on performance and can be a bottleneck.
Since we are internally invoking another API call from ours, we shouldn't assume that it is hosted on the same server - as such this should be treated just like all I/O bound operations.
With the two known factors above, we should make our calls async. Consider the following:
public async Task<ActionResult> Index()
{
var result = await GetIdAsync();
ViewBag.result = result.Data;
return View();
}
public async Task<JsonResult> GetIdAsync()
{
var result = "";
var url = "http://APIURL/GetID";
using (WebClient client = new WebClient())
{
// Note the API change here?
result = await client.DownloadStringAsync(url);
}
return Json(result, JsonRequestBehavior.AllowGet);
}
Now, we are correctly using async and await keywords with our Task<T> returning operations. This will help to ensure ideal performance. Notice the API change on the client.DownloadStringAsync too, this is very important.