Best Practice for I/O Heavy Async Task with WebApi

Best Practice for I/O Heavy Async Task with WebApi - c#

I have async action responding to a HTTP POST via web api 1.0. I need to do 2 things when I receive this request:
Do a database insert and return the identity of that new entry to the WebApp that called the function.
Using that identity to do a whole bunch work that is I/O heavy, that they WebApp and the user don't immediately care about.
In a perfect world I would put data on a queue somewhere and have a little worker to handle the queue. Since I can't immediately do that, what is the best way to make sure this work gets done without impacting the user.
[HttpPost]
public async Task<int> Post([FromBody]Object myObject)
{
return await new ObjectLogic().InsertObject(myObject);
}
public async Task<int> InsertObject(Object myObject)
{
var id = await new ObjectData().InsertObjectRoot(myObject);
Task.Run(() => new ObjectData().ObjectWork(id, myObject));
return id;
}
This is the solution I came up but I think there has to be something better since I am bascially stealing of thread from the thread pool until my work is finished. Is there a better way? I think I could use ConfigureAwait(false) in my InsertObject method since I really dont' care about the context there.
// await async function but use ConfigureAwait
public async Task<int> InsertObject(Object myObject)
{
var id = await new ObjectData().InsertObjectRoot(myObject);
await new ObjectData().ObjectWork(id, myObject).ConfigureAwait(false);
return id;
}

One question is whether your Web API should do anything other than
receive the request
place it on a queue
response with an id to indicate that the request has been received.
It's going to depend to some degree on what sort of load you're expecting or might possibly see. But if you're concerned about the number of available threads from the outset then perhaps the answer is that your Web API does nothing but the above steps.
The queue could be a literal queue, like MSMQ (or whatever is popular now.) Or it could consist of a record inserted into a table. A separate Windows service could then process that queue and do the I/O heavy work. It doesn't even have to be on the same server. You can scale it separately.
If the user does want some eventual indication then they could poll for it at intervals using the id that you returned. But for me the key is in this statement:
Using that identity to do a whole bunch work that is I/O heavy, that the WebApp and the user don't immediately care about.
The job of a web application is to serve responses - IOW, to do what the user does care about. If it's long-running, I/O heavy work that the user doesn't care about then I'd consider offloading it.

Related

REST-Call with async server implementation

In my application I have a server which provides a REST-Api where my UI can communicate with.
Now I have to start a long running Process on the Server but I want the client not to wait for the response of the server.
I know I could just fire and forget the Post-Call and not await the response but I need to know at the client that the Process on the server was startet correctly.
So I thought about the following:
[HttpPost]
[Route("startscan")]
public HttpResponseMessage StartScan()
{
Task.Factory.StartNew( () =>
{
//Do long running things here
});
return Request.CreateResponse(HttpStatusCode.OK);
}
So my question now is: Will the task I started be executed to it's end? Background of the question is, that as far as I knwo a controller-instance is created for each call to it. So will the instance of the controller be terminated when the requested finished or will the Task I started run to it's end. The task can take up to 10 minutes or longer sometimes.

A simple approach would be to just use an asynchronous method without awaiting the Task result like so:
[HttpPost]
[Route("startscan")]
public async HttpResponseMessage StartScan()
{
DoLongRunningThings();
return Request.CreateResponse(HttpStatusCode.OK);
}
public async Task DoLongRunningThings()
{
// Do Stuff here
}
However, if you Processing is more complex and requires more resilience you should look into how to use background jobs. Here is a good collection of what you can use: https://stackoverflow.com/a/36098218/16154479

Will the task I started be executed to it's end?
Probably yes, but also possibly no. Sometimes it won't be executed completely. This is the result of misusing an ASP.NET app (which handles HTTP requests) as a background service (which runs outside of a request context).
There is a best practice that avoids possibly-partial execution: a distributed architecture (as I describe on my blog). The idea is that your controller enqueues a message describing the work to be done into a durable queue and then returns. Then there's a separate background service that reads from the queue and does the actual work.

Parallelizing synchronous tasks while retaining the HttpContext.Current in ASP.NET

I've scoured the SO for answers but found none that pertain to the problem at hand, although this one nails it on "why", but isn't solving it.
I have a REST endpoint that needs to gather data from other endpoints - in doing so, it accesses the HttpContext (setting authentication, headers, etc... all done with 3rd party lib I don't have access to).
Unfortunately, this library for service communication is made to be synchronous, and we want to parallelize its use.
In the following example (abstracted) code, the issue is that CallEndpointSynchronously unfortunately uses some built in authentication, which throws null exception when HttpContext isn't set:
public class MyController: ApiController
//...
[HttpPost]
public async Task<IHttpActionResult> DoIt(IEnumerable<int> inputs)
{
var tasks = inputs.Select(i =>
Task.Run(()=>
{
/* call some REST endpoints, pass some arguments, get the response from each.
The obvious answer (HttpContext.Current = parentContext) can't work because
there's some async code underneath (for whatever reasons), and that would cause it
to sometimes not return to the same thread, and basically abandon the Context,
again resulting in null */
var results = Some3rdPartyTool.CallEndpointSynchronously(MyRestEndpointConfig[i]);
return results;
});
var outcome = await Task.WhenAll(tasks);
// collect outcome, do something with it, render outputs...
}
Is there a cure for this?
We want to optimize for single requests, not interested in maximizing parallel users at this moment.

Unfortunately, this library for service communication is made to be synchronous, and we want to parallelize its use.
throws null exception when HttpContext isn't set:
The obvious answer (HttpContext.Current = parentContext) can't work because there's some async code underneath (for whatever reasons), and that would cause it to sometimes not return to the same thread, and basically abandon the Context, again resulting in null
There's an important part of your question in the example code comment. :)
Normally, HttpContext shouldn't be shared across threads. It's just not threadsafe at all. But you can set HttpContext.Current (for some reason), so you can choose to live dangerously.
The more insidious problem here is that the library has a synchronous API and is doing sync-over-async - but somehow without deadlocking (?). At this point, I must be honest and say the best approach is to fix the library: make the vendor fix it, or submit a PR, or just rewrite it if you have to.
However, there is a tiny chance that you can get this kinda sorta working by adding Even More Dangerous code.
So, here's the information you need to know:
ASP.NET (pre-Core) uses an AspNetSynchronizationContext. This context:
Ensures that only one thread runs in this context at a time.
Sets HttpContext.Current for any thread that is running in the context.
Now, you could capture the SynchronizationContext.Current and install it on the thread pool threads, but in addition to being Very Dangerous, it would not achieve your actual goal (parallelization), since the AspNetSynchronizationContext only allows one thread in at a time. The first portion of the 3rd-party code would be able to run in parallel, but anything queued to the AspNetSynchronizationContext would run one thread at a time.
So, the only way I can think of making this work is to use your own custom SynchronizationContext that resumes on the same thread, and set HttpContext.Current on that thread. I have an AsyncContext class that can be used for this:
[HttpPost]
public async Task<IHttpActionResult> DoIt(IEnumerable<int> inputs)
{
var context = HttpContext.Current;
var tasks = inputs.Select(i =>
Task.Run(() =>
AsyncContext.Run(() =>
{
HttpContext.Current = context;
var results = Some3rdPartyTool.CallEndpointSynchronously(MyRestEndpointConfig[i]);
return results;
})));
var outcome = await Task.WhenAll(tasks);
}
So for each input, a thread is grabbed from the thread pool (Task.Run), a custom single-threaded synchronization context is installed (AsyncContext.Run), HttpContext.Current is set, and then the code in question is run. This may or may not work; it depends on how exactly Some3rdPartyTool uses its SynchronizationContext and HttpContext.
Note that there are several bad practices in this solution:
Using Task.Run on ASP.NET.
Accessing the same HttpContext instance simultaneously from multiple threads.
Using AsyncContext.Run on ASP.NET.
Blocking on asynchronous code (done by AsyncContext.Run and also presumably Some3rdPartyTool.
In conclusion, I again recommend updating/rewriting/replacing Some3rdPartyTool. But this pile of hacks might work.

Wait for a third-party API callback

I need to create an REST API that connect to a third party SOAP API. The third party API events are sent by callback to an URL I provide.
The typical steps my API go through is it starts a session with the third party by providing an ID and an callback URL. The third party can now send new events to my API through this URL when, for example, a new participant connects. Now sometimes i need to request specific info, like the list of participants for a given session(ID), and wait for the event containing the info.
Note that there may be multiple open sessions at the same time.
An example of what I need:
private string url = "http://myapi/callback";
[HttpGet]
[Route("createSession")]
public async Task<string> CreateSession()
{
var id = Guid.NewGuid().ToString();
var result = await ExternAPI.CreateSession(id, this.url);
return result; //contains the id
}
[HttpGet]
[Route("endSession")]
public async Task<string> EndSession([FromUri] string id)
{
var result = await ExternAPI.EndSession(id);
return result;
}
[HttpGet]
[Route("partipants")]
public async Task<string> Partipants([FromUri] string id)
{
ExternAPI.participants(id); // The results of this method will be sent to the callback function
results = // Wait for the results for this id
return results;
}
[HttpPost]
[Route("callback")]
public void Callback(body)
{
// notify waiting function and pass body
}
I came up with a solution using ReactiveX but I'm not really sure about its reliability in production. What I have in mind is to create a subject that never terminate and handle all the events but it is not a usual lifetime for a subject, what happens on error ? And I don't think I did it the "RX-way" (state concerns).
Here it is (you will need System.Reactive to run this code):
class Data
{
public int id;
public string value;
}
class Program
{
private static Subject<Data> sub;
static void Main(string[] args)
{
sub = new Subject<Data>();
Task.Run(async () => {
int id = 1;
ExternAPI(CallBackHook, id);
Data result = await sub.Where(data => data.id == id).FirstAsync();
Console.WriteLine("{0}", result.value);
});
Console.ReadLine();
}
static void CallBackHook(Data data)
{
sub.OnNext(data);
}
static String ExternAPI(Action<Data> callback, int id)
{
// Third-party API, access via SOAP. callback is normally an url (string)
Task.Run(() =>
{
Thread.Sleep(1000);
callback(new Data { id = id, value = "test" });
});
return "success";
}
}
An other way will be a dictionary of subjects, one for each session, so I could manage their lifetimes.

it is not a usual lifetime for a subject
what happens on error?
And I don't think I did it the "RX-way"
Yes, these are all perfectly valid concerns with this kind of approach. Personally, I don't much mind the last one, because even though Subjects are frowned-upon, many times they're just plain easier to use than the proper Rx way. With the learning curve of Rx what it is, I tend to optimize for developer maintainability, so I do "cheat" and use Subjects unless the alternative is equally understandable.
Regarding lifetime and errors, the solutions there depend on how you want your application to behave.
For lifetime, it looks like currently you have a WebAPI resource (the SOAP connection) requiring an explicit disconnect call from your client; this raises some red flags. At the very least, you'd want some kind of timeout there where that resource is disposed even if endSession is never called. Otherwise, it'll be all too easy to end up with dangling resources.
Also for errors, you'll need to decide the appropriate approach. You could "cache" the error and report it to each call that tries to use that resource, and "clear" the error when endSession is called. Or, if it's more appropriate, you could let an error take down your ASP.NET process. (ASP.NET will restart a new one for you).
To delay an API until you get some other event, use TaskCompletionSource<T>. When starting the SOAP call (e.g., ExternAPI.participants), you should create a new TCS<T>. The API call should then await the TaskCompletionSource<T>.Task. When the SOAP service responds with an event, it should take that TaskCompletionSource<T> and complete it. Points of note:
If you have multiple SOAP calls that are expecting responses over the same event, you'll need a collection of TaskCompletionSource<T> instances, along with some kind of message-identifier to match up which events are for which calls.
Be sure to watch your thread safety. Incoming SOAP events are most likely arriving on the thread pool, with (possibly multiple) API requests on other thread pool threads. TaskCompletionSource<T> itself is threadsafe, but you'd need to make your collection threadsafe as well.
You may want to write a Task-based wrapper for your SOAP service first (handling all the TaskCompletionSource<T> stuff), and then consume that from your WebAPI.
As a very broad alternative, instead of bridging SOAP with WebAPI, I would consider bridging SOAP with SignalR. You may find that this is a more natural translation. Among other things, SignalR will give you client-connect and client-disconnect events (complete with built-in timeouts for clients). So that may solve your lifetime issues more naturally. You can use the same Task-based wrapper for your SOAP service as well, or just expose the SOAP events directly as SignalR messages.

Async/await and resource access

I’m wondering how to implement resource access nicely using async/await. I have singleton service in web application that is acting as a proxy to LDAP and have to buffer all data on first access – all invocation after that is done via cache but after some time cache is invalidated and data should be get again. Now my implementation looks like this but it is not meet my requirements
public async Task<string> GetUserDisplayName(string username)
{
var users = await GetCachedUsers();
// code using users from cache
}
private async Task<IEnumerable<LdapUser>> GetCachedUsers()
{
var users = _Cache.Get<IEnumerable<LdapUser>>();
if (users == null)
{
users = await _Connector.GetAllUsers();
_Cache.Add(users, TimeSpan.FromHours(USER_CACHE_VALID_HOURS));
}
return users;
}
I’m wondering how to implement this in this way that when couple request go to the service first time they should be awaited on the same task but not blocked and download from LDAP should go only once. I could do this traditionally and lock the resource but that threads will be blocked and I want them to back to threadpool in async way like in async/wait pattern.

SemaphoreSlim has a WaitAsync method that will let you create a critical section in asynchronous code. You can use that semaphore to prevent multiple invocations of the method from generating the value together without actually blocking any of the threads.

Async does not work in asynchronous controller mvc 4.0

I have MVC 4.0 application targated at targetFramework="4.5".
I have to basically convert the existing functionality of file processing from synchronous to asynchronous (so that for large file user don't have to wait for other task).
My code is
[HttpPost]
public async Task<ActionResult> FileUpload(HttpPostedFileBase fileUpload)
{
Coreservice objVDS = new Coreservice ();
//validate the contents of the file
model =objVDS. ValidateFileContents(fileUpload);
// if file is valid start processing asynchronously
await Task.Factory.StartNew(() => { objVDS.ProcessValidFile(model); }, CancellationToken.None,
TaskCreationOptions.DenyChildAttach,
TaskScheduler.FromCurrentSynchronizationContext());
return view();
}
Basically I want to call a asynchronous method which is in services which does database operations( diffrent project).
I want asynchronous process to have access to the context in services methods. Thats why I am using
TaskScheduler.FromCurrentSynchronizationContext() in Task.Factory.StartNew().
The service method is like following in which, based on file type, a second service is getting called for data operations
public async task ProcessValidFile(fileProcessDataModel model)
{
employeeWorkedDataservice service =new employeeWorkedDataservice()
await Task.Factory.StartNew(() =>
{
service .ProcessEmployeeDataFile(model.DataSetToProcess, OriginalFileName, this, model.Source);
},
CancellationToken.None,
TaskCreationOptions.DenyChildAttach,
TaskScheduler.FromCurrentSynchronizationContext());
}
ProcessEmployeeDataFile returns void and its not asynchronous method.
When the code above is executed it does not return to controller untill it completes the data processing. I think that I am missing something here.
Please guide me to solution.
Thanks,
Amol

Looks like you've misunderstood how await works.
Read this https://msdn.microsoft.com/en-us/library/hh191443.aspx#BKMK_WhatHappensUnderstandinganAsyncMethod
Setting something running in a task will allow it to run asynchronously so you can do something else while it's running.
When you need the result to continue, you use the await keyword.
By creating your task an immediately awaiting it, you're instantly blocking until the task resolves; making it effectively synchronous.
If you're happy to return to your view without waiting for processing to complete, I don't think you need await at all, since at no point do you want to wait for the result of the operation.
public task ProcessValidFile(fileProcessDataModel model)
{
employeeWorkedDataservice service =new employeeWorkedDataservice()
return Task.Factory.StartNew(() =>
{
service.ProcessEmployeeDataFile(model.DataSetToProcess, OriginalFileName, this, model.Source);
},
CancellationToken.None,
TaskCreationOptions.DenyChildAttach,
TaskScheduler.FromCurrentSynchronizationContext());
}
[HttpPost]
public ActionResult FileUpload(HttpPostedFileBase fileUpload)
{
Coreservice objVDS = new Coreservice ();
//validate the contents of the file
model =objVDS. ValidateFileContents(fileUpload);
// if file is valid start processing asynchronously
// This returns a task, but if we're not interested in waiting
// for its results, we can ignore it.
objVDS.ProcessValidFile(model);
return view();
}
Regarding your comments:
I would seriously consider not passing your controller to your service, or having your service rely on the session and context since you're tightly coupling your business logic to your API controller.
Get the bits you need from the controller while you're in it and pass them to your service.

I have to basically convert the existing functionality of file processing from synchronous to asynchronous (so that for large file user don't have to wait for other task).
That's not what async does; as I describe on my blog, async does not change the HTTP protocol.
What you want is some form of "fire and forget" on ASP.NET. I have another blog post that covers a few solutions. Note that using Task.Factory.StartNew is the most dangerous of all these solutions.
The best (read: most reliable) solution is to use a proper distributed architecture: your ASP.NET app should create a description of the work to be done and place that in a reliable queue (e.g., MSMQ); then have an independent backend (e.g., Win32 service) that processes the queue. This is complex, but much less error-prone than attempting to force ASP.NET to do something it was never meant to do.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.