How do I implement an async I/O bound operation from scratch? - c#

I'm trying to understand how and when to use async programming and got to I/O bound operations, but I don't understand them. I want to implement them from scratch. How can I do that?
Consider the example below which is synchronous:
private void DownloadBigImage() {
var url = "https://cosmos-magazine.imgix.net/file/spina/photo/14402/180322-Steve-Full.jpg";
new WebClient().DownloadFile(url, "image.jpg");
}
How do I implement the async version by only having the normal synchronous method DownloadBigImage without using Task.Run since that will use a thread from the thread pool only for waiting - that's just being wasteful!
Also do not use the special method that's already async! This is the purpose of this question: how do I make it myself without relying on methods which are already async? So, NO things like:
await new WebClient().DownloadFileTaskAsync(url, "image.jpg");
Examples and documentation available are very lacking in this regard. I found only this:
https://learn.microsoft.com/en-us/dotnet/standard/async-in-depth
which says:
The call to GetStringAsync() calls through lower-level .NET libraries (perhaps calling other async methods) until it reaches a P/Invoke interop call into a native networking library. The native library may subsequently call into a System API call (such as write() to a socket on Linux). A task object will be created at the native/managed boundary, possibly using TaskCompletionSource. The task object will be passed up through the layers, possibly operated on or directly returned, eventually returned to the initial caller.
Basically I have to use a "P/Invoke interop call into a native networking library"... but how?

This is a great question which really isn't explained well in most texts about C# and async.
I searched for this for ages thinking I could and should maybe be implementing my own async I/O methods. If a method/library I was using didn't have async methods I thought I should somehow wrap these functions in code that made them asynchronous. It turns out that this isn't really feasible for most programmers. Yes, you can spawn a new thread using Thread.Start(() => {...}) and that does make your code asynchronous, but it also creates a new thread which is an expensive overhead for asynchronous operations. It can certainly free up your UI thread to ensure your app stays responsive, but it doesn't create a truly async operation the way that HttpClient.GetAsync() is a truly asynchronous operation.
This is because async methods in the .net libraries use something called "standard P/Invoke asynchronous I/O system in .NET" to call low level OS code that doesn't require a dedicated CPU thread while doing outbound IO (networking or storage). It actually doesn't dedicate a thread to its work and signals the .net runtime when it's done doing its stuff.
I'm not familiar with the details but this knowledge is enough to free me from trying to implement async I/O and make me focus on using the async methods already present in the .net libraries (such as HttpClient.GetAsync()). More interesting info can be found here (Microsoft async deep dive) and a nice description by Stephen Cleary here

I think this is a very interesting question and a fun learning exercise.
Fundamentally, you cannot use any existing API that is synchronous. Once it's synchronous there is no way to turn it truly asynchronous. You correctly identified that Task.Run and it's equivalents are not a solution.
If you refuse to call any async .NET API then you need to use PInvoke to call native APIs. This means that you need to call the WinHTTP API or use sockets directly. This is possible but I don't have the experience to guide you.
Rather, you can use async managed sockets to implement an async HTTP download.
Start with the synchronous code (this is a raw sketch):
using (var s = new Socket(...))
{
s.Connect(...);
s.Send(GetHttpRequestBytes());
var response = new StreamReader(new NetworkStream(s)).ReadToEnd();
}
This very roughly gets you an HTTP response as a string.
You can easily make this truly async by using await.
using (var s = new Socket(...))
{
await s.ConnectAsync(...);
await s.SendAsync(GetHttpRequestBytes());
var response = await new StreamReader(new NetworkStream(s)).ReadToEndAsync();
}
If you consider await cheating with respect to your exercise goals you would need to write this using callbacks. This is awful so I'm just going to write the connect part:
var s = new Socket(...)
s.BeginConnect(..., ar => {
//perform next steps here
}, null);
Again, this code is very raw but it shows the principle. Instead of waiting for an IO to complete (which happens implicitly inside of Connect) you register a callback that is called when the IO is done. That way your main thread continues to run. This turns your code into spaghetti.
You need to write safe disposal with callbacks. This is a problem because exception handling cannot span callbacks. Also, you likely need to write a read loop if you don't want to rely on the framework to do that. Async loops can be mind bending.

TLDR: Generally you can by using TaskCompletionSource.
If you only have blocking calls available then you cannot do this. But usually there are "old" asynchronous methods that do not use async nor Task, but rely instead on callbacks. In that case you can use a TaskCompletionSource to both create a Task that can be returned, and use it to set the Task to completed when the callback returns.
Example using the old .Net Framework 3.0 methods in WebClient (but programmed in later .Net that has Task):
public Task DownloadCallbackToAsync(string url, string filename)
{
using (var client = new WebClient())
{
TaskCompletionSource taskCreator = new TaskCompletionSource();
client.DownloadFileCompleted += (sender, args) => taskCreator.SetResult();
client.DownloadFileAsync(url, filename);
return taskCreator.Task;
}
}
Here you will imidiately initiate the call and return a Task. If you await the Task in the calling method you will not continue until the callback (DownloadFileCompleted) has occurred.
Note that this method itself is not async as it does not need to await a Task.

Create a new task that executes the synchronous code. The task will be executed by a thread of the threadpool.
private async Task DownloadBigImage()
{
await Task.Run(()=>
{
var url = "https://cosmos-magazine.imgix.net/file/spina/photo/14402/180322-Steve-Full.jpg";
new WebClient().DownloadFile(url, "image.jpg");
});
}

Related

Calling async methods from non-async code

I'm in the process of updating a library that has an API surface that was built in .NET 3.5. As a result, all methods are synchronous. I can't change the API (i.e., convert return values to Task) because that would require that all callers change. So I'm left with how to best call async methods in a synchronous way. This is in the context of ASP.NET 4, ASP.NET Core, and .NET/.NET Core console applications.
I may not have been clear enough - the situation is that I have existing code that is not async aware, and I want to use new libraries such as System.Net.Http and the AWS SDK that support only async methods. So I need to bridge the gap, and be able to have code that can be called synchronously but then can call async methods elsewhere.
I've done a lot of reading, and there are a number of times this has been asked and answered.
Calling async method from non async method
Synchronously waiting for an async operation, and why does Wait() freeze the program here
Calling an async method from a synchronous method
How would I run an async Task<T> method synchronously?
Calling async method synchronously
How to call asynchronous method from synchronous method in C#?
The problem is that most of the answers are different! The most common approach I've seen is use .Result, but this can deadlock. I've tried all the following, and they work, but I'm not sure which is the best approach to avoid deadlocks, have good performance, and plays nicely with the runtime (in terms of honoring task schedulers, task creation options, etc). Is there a definitive answer? What is the best approach?
private static T taskSyncRunner<T>(Func<Task<T>> task)
{
T result;
// approach 1
result = Task.Run(async () => await task()).ConfigureAwait(false).GetAwaiter().GetResult();
// approach 2
result = Task.Run(task).ConfigureAwait(false).GetAwaiter().GetResult();
// approach 3
result = task().ConfigureAwait(false).GetAwaiter().GetResult();
// approach 4
result = Task.Run(task).Result;
// approach 5
result = Task.Run(task).GetAwaiter().GetResult();
// approach 6
var t = task();
t.RunSynchronously();
result = t.Result;
// approach 7
var t1 = task();
Task.WaitAll(t1);
result = t1.Result;
// approach 8?
return result;
}
So I'm left with how to best call async methods in a synchronous way.
First, this is an OK thing to do. I'm stating this because it is common on Stack Overflow to point this out as a deed of the devil as a blanket statement without regard for the concrete case.
It is not required to be async all the way for correctness. Blocking on something async to make it sync has a performance cost that might matter or might be totally irrelevant. It depends on the concrete case.
Deadlocks come from two threads trying to enter the same single-threaded synchronization context at the same time. Any technique that avoids this reliably avoids deadlocks caused by blocking.
In your code snippet, all calls to .ConfigureAwait(false) are pointless because the return value is not awaited. ConfigureAwait returns a struct that, when awaited, exhibits the behavior that you requested. If that struct is simply dropped, it does nothing.
RunSynchronously is invalid to use because not all tasks can be processed that way. This method is meant for CPU-based tasks, and it can fail to work under certain circumstances.
.GetAwaiter().GetResult() is different from Result/Wait() in that it mimics the await exception propagation behavior. You need to decide if you want that or not. (So research what that behavior is; no need to repeat it here.) If your task contains a single exception then the await error behavior is usually convenient and has little downside. If there are multiple exceptions, for example from a failed Parallel loop where multiple tasks failed, then await will drop all exceptions but the first one. That makes debugging harder.
All these approaches have similar performance. They will allocate an OS event one way or another and block on it. That's the expensive part. The other machinery is rather cheap compared to that. I don't know which approach is absolutely cheapest.
In case an exception is being thrown, that is going to be the most expensive part. On .NET 5, exceptions are processed at a rate of at most 200,000 per second on a fast CPU. Deep stacks are slower, and the task machinery tends to rethrow exceptions multiplying their cost. There are ways of blocking on a task without the exception being rethrown, for example task.ContinueWith(_ => { }, TaskContinuationOptions.ExecuteSynchronously).Wait();.
I personally like the Task.Run(() => DoSomethingAsync()).Wait(); pattern because it avoids deadlocks categorically, it is simple and it does not hide some exceptions that GetResult() might hide. But you can use GetResult() as well with this.
I'm in the process of updating a library that has an API surface that was built in .NET 3.5. As a result, all methods are synchronous. I can't change the API (i.e., convert return values to Task) because that would require that all callers change. So I'm left with how to best call async methods in a synchronous way.
There is no universal "best" way to perform the sync-over-async anti-pattern. Only a variety of hacks that each have their own drawbacks.
What I recommend is that you keep the old synchronous APIs and then introduce asynchronous APIs alongside them. You can do this using the "boolean argument hack" as described in my MSDN article on Brownfield Async.
First, a brief explanation of the problems with each approach in your example:
ConfigureAwait only makes sense when there is an await; otherwise, it does nothing.
Result will wrap exceptions in an AggregateException; if you must block, use GetAwaiter().GetResult() instead.
Task.Run will execute its code on a thread pool thread (obviously). This is fine only if the code can run on a thread pool thread.
RunSynchronously is an advanced API used in extremely rare situations when doing dynamic task-based parallelism. You're not in that scenario at all.
Task.WaitAll with a single task is the same as just Wait().
async () => await x is just a less-efficient way of saying () => x.
Blocking on a task started from the current thread can cause deadlocks.
Here's the breakdown:
// Problems (1), (3), (6)
result = Task.Run(async () => await task()).ConfigureAwait(false).GetAwaiter().GetResult();
// Problems (1), (3)
result = Task.Run(task).ConfigureAwait(false).GetAwaiter().GetResult();
// Problems (1), (7)
result = task().ConfigureAwait(false).GetAwaiter().GetResult();
// Problems (2), (3)
result = Task.Run(task).Result;
// Problems (3)
result = Task.Run(task).GetAwaiter().GetResult();
// Problems (2), (4)
var t = task();
t.RunSynchronously();
result = t.Result;
// Problems (2), (5)
var t1 = task();
Task.WaitAll(t1);
result = t1.Result;
Instead of any of these approaches, since you have existing, working synchronous code, you should use it alongside the newer naturally-asynchronous code. For example, if your existing code used WebClient:
public string Get()
{
using (var client = new WebClient())
return client.DownloadString(...);
}
and you want to add an async API, then I would do it like this:
private async Task<string> GetCoreAsync(bool sync)
{
using (var client = new WebClient())
{
return sync ?
client.DownloadString(...) :
await client.DownloadStringTaskAsync(...);
}
}
public string Get() => GetCoreAsync(sync: true).GetAwaiter().GetResult();
public Task<string> GetAsync() => GetCoreAsync(sync: false);
or, if you must use HttpClient for some reason:
private string GetCoreSync()
{
using (var client = new WebClient())
return client.DownloadString(...);
}
private static HttpClient HttpClient { get; } = ...;
private async Task<string> GetCoreAsync(bool sync)
{
return sync ?
GetCoreSync() :
await HttpClient.GetString(...);
}
public string Get() => GetCoreAsync(sync: true).GetAwaiter().GetResult();
public Task<string> GetAsync() => GetCoreAsync(sync: false);
With this approach, your logic would go into the Core methods, which may be run synchronously or asynchronously (as determined by the sync parameter). If sync is true, then the core methods must return an already-completed task. For implemenation, use synchronous APIs to run synchronously, and use asynchronous APIs to run asynchronously.
Eventually, I recommend deprecating the synchronous APIs.
I just went thru this very thing with the AWS S3 SDK. Used to be sync, and I built a bunch of code on that, but now it's async. And that's fine: they changed it, nothing to be gained by moaning about it, move on.So I need to update my app, and my options are to either refactor a large part of my app to be async, or to "hack" the S3 async API to behave like sync.I'll eventually get around to the larger async refactoring - there are many benefits - but for today I have bigger fish to fry so I chose to fake the sync.Original sync code was ListObjectsResponse response = api.ListObjects(request); and a really simple async equivalent that works for me is Task<ListObjectsV2Response> task = api.ListObjectsV2Async(rq2);ListObjectsV2Response rsp2 = task.GetAwaiter().GetResult();
While I get it that purists might pillory me for this, the reality is that this is just one of many pressing issues and I have finite time so I need to make tradeoffs. Perfect? No. Works? Yes.
You Can Call Async Method From non-async method .Check below Code .
public ActionResult Test()
{
TestClass result = Task.Run(async () => await GetNumbers()).GetAwaiter().GetResult();
return PartialView(result);
}
public async Task<TestClass> GetNumbers()
{
TestClass obj = new TestClass();
HttpResponseMessage response = await APICallHelper.GetData(Functions.API_Call_Url.GetCommonNumbers);
if (response.IsSuccessStatusCode)
{
var result = response.Content.ReadAsStringAsync().Result;
obj = JsonConvert.DeserializeObject<TestClass>(result);
}
return obj;
}

c# blocking code in async method

I’m using MvvmCross and the AsyncEx library within a Windows 10 (UWP) App.
In a ViewModel, I have an INotifyTaskCompletion property (1) which is wired-up to an Async method in the ViewModel (2)
In (2), I call an Async library method which:
Checks a local cache
Downloads data asynchronously
Adds the data to the cache
The caching code cannot be made asynchronous and so the library method contains both blocking and asynchronous code.
Q. What’s the best way to prevent blocking the UI thread?
I understand from Stephen Cleary to not to block in asynchronous code and not use Task.Run in library methods. So do I have to….
Move the caching calls into (2) e.g.
Use Task.Run (to check the cache)
Call the library method asynchronously
Use Task.Run again (to cache the data)?
Is there a better way?
If you have completely synchronous code which you can't change to make it return an awaitable, and want to make it asynchronous, then yes, your only choice if you want to use async/await is to use Task.Run().
Something like:
public async Task<T> MyMethod()
{
T result = await Task.Run(() => CheckCacheOnSyncMethodICantChange());
if(result != null)
{
result = await MyLibraryMethodThatReturnsATask();
await Task.Run(() => AddToCacheOnSyncMethodICantChange(result));
}
return result;
}
Should be ok.

what are the circumstances of using .Wait() when calling async methods

I have the following async long running method inside my asp.net mvc-5 web application :-
public async Task<ScanResult> ScanAsync(string FQDN)
{
// sample of the operation i am doing
var c = await context.SingleOrDefaultAsync(a=>a.id == 1);
var list = await context.Employees.ToListAsync();
await context.SaveChangesAsync();
//etc..
}
and i am using Hangfire tool which support running background jobs to call this async method on timely basis, but un-fortuntly the hangefire tool does not support calling async methods directly . so to overcome this problem i created a sync version of the above method , as follow:-
public void Scan()
{
ScanAsync("test").Wait();
}
then from the HangFire scheduler i am calling the sync method as follow:-
RecurringJob.AddOrUpdate(() => ss.Scan(), Cron.Minutely);
so i know that using .Wait() will mainly occupy the iis thread during the method execution ,, but as i mentioned i need to do it in this way as i can not directly call an async TASK inside the hangefire scheduler .
so what will happen when i use .Wait() to call the async method ?, will the whole method's operations be done in a sync way ? for example as shown above i have three async operations inside the ScanAsync() ;SingleOrDefualtAsync,ToListAsync & SaveChangesAsync, so will they be executed in sync way because i am calling the ScanAsync method using .wait() ?
so what will happen when i use .Wait() to call the async method ?,
will the whole method's operations be done in a sync way ? for example
as shown above i have three async operations inside the ScanAsync()
;SingleOrDefualtAsync,ToListAsync & SaveChangesAsync, so will they be
executed in sync way because i am calling the ScanAsync method using
.wait() ?
The methods querying the database will still be executed asynchronously, but the fact that you're calling Wait means that even though you're releasing the thread, it wont return to the ASP.NET ThreadPool as you're halting it.
This is also a potential for deadlocks, as ASP.NET has a custom synchronization context which makes sure the context of the request is availiable when in a continuation of an async call.
I would recommend that instead, you'd use the synchronous API provided by entity-framework, as you won't actually be enjoying the scalability one can get from asynchronous calls.
Edit:
In the comments, you asked:
As i am currently doing with hangefire eliminate the async effect ? if yes then will it be better to use sync methods instead ? or using sync or async with hangeffire will be exactly the same
First, you have to understand what the benefits of async are. You don't do it because it's cool, you do it because it serves a purpose. What is that purpose? Being able to scale under load. How does it do that? When you await an async method, control is yielded back to the caller. For example, you have an incoming request, you query you database. You can either sit there and wait for the query to finish, or you can re-use that thread to serve more incomong requests. That's the true power.
If you don't actually plan on receiving a decent amount of requests (such that you'll starve the thread-pool), you won't actually see any benefit from async. Currently, the way you've implemented it, you won't see any of those benefits because you're blocking the async calls. All you'll see, possibly, are deadlocks.
This very much depends on the way HangFire is implemented. If it just queuing tasks to be invoked in ThreadPool the only effect will be, that one of your threads will be blocked until the request is ended. However if there is a custom SynchronizationContext this can lead to a serious deadlock.
Consider, if you really want to wait for your scheduled job to be done. Maybe all you want is just a fire and forget pattern. This way your method will be like:
public void Scan()
{
ScanAsync("test"); // smoothly ignore the task
}
If you do need to wait, you can instead try using async void method:
public async void Scan()
{
await ScanAsync("test");
DoSomeOtherJob();
}
There are many controversies about using async void as you cannot wait for this method to end, nor you can handle possible errors.
However, in event driven application this can be the only way. For more informations you can refer to: Async Void, ASP.Net, and Count of Outstanding Operations

What is a puppet task in C# asynchronous programming?

in a discussion on asynchronous programming in C# the author speaks matter of factly of "puppet tasks". It shows up in various places on stack overflow. But it is only used, never explained. Here are some examples of where it is mentioned here, here, here, and in the book Async in C# 5.0.
What should I make of puppet tasks or puppets in programming? Is it indicating some mockup? A task wrapper or what?
Thank you.
Before the creation of Task in C#, Microsoft had implemented other means of creating asynchronous events. One was the Event-Asynchronous Pattern (EAP), where a callback method is fired when the asynchronous event finishes. Another is the Asynchronous Programming Model (APM) which uses IAsyncResult and Begin/End style methods.
When the Task-based Asynchronous Pattern (TAP) was added, Microsoft made a concerted effort to add Task based versions of all of the async APIs in the BCL. However, they couldn't do everything, and on top of that, many third-party libraries were already out there using either EAP or APM. At the same time though, they realized the value of using the TAP and the value async and await bring. As a result, they created the TaskCompletionSource. This serves as a wrapper to make non-TAP APIs work with TAP. Essentially, you create what many refer to as a puppet task. This is a task that really only exists to turn an EAP or APM (or other async pattern) method into a TAP method. For example, say you have a class, DownloadFile. This class fires a DownloadComplete event when the file is done. Normally you'd do something like:
DownloadFile myfile = new DownloadFile();
myfile.Complete += SomeMethodToHandleCompletion;
myfile.Download();
Then, at some point your method fires. That's great, but we want tasks. So what you could do is something like:
public Task<string> DownloadFileAsync(string url)
{
var tcs = new TaskCompletionSource<string>();
DownloadFile myfile = new DownloadFile(url);
myfile.Complete += (theData) =>
{
tcs.SetResult(theData);
};
return tcs.Task;
}
That's a simplistic example since I'm not handling exceptions or anything, but essentially, now I've turned an event handler (EAP) into something I can use with async/await by creating a "puppet task" that simply does the bidding of the event. When the event finishes, we simply say "hey the Task is done too". As a result, the following works:
string downloadedFile = await DownloadFileAsync("http://www.some.url.example");

Execute asynchronous code as synchronous

I have the following asynchronous code:
public async Task<List<PreprocessingTaskResult>> Preprocess(Action onPreprocessingTaskFinished)
{
var preprocessingTasks = BuildPreprocessingTasks();
var preprocessingTaskResults = new List<PreprocessingTaskResult>();
while (preprocessingTasks.Count > 0)
{
//Wait till at least one task is completed
await TaskEx.WhenAny(preprocessingTasks.ToArray());
onPreprocessingTaskFinished();
}
return
}
And the asynchronous usage code
var preprocessingTaskResults = await Preprocess(someAction);
For some cases I need to call it in synchronous way. For simpler cases (when async method returns just a task) I used to do the following:
var someTask = AsyncMethod();
Task.Wait(someTask);
But I am confused how I should implement it here.
A task's Result property will block until the task completes and then return its result:
List<PreprocessingTaskResult> result = Preprocess(someAction).Result;
http://msdn.microsoft.com/en-us/library/dd321468(v=vs.110).aspx
There is no easy way to call asynchronous code synchronously. Stephen Toub covers some various approaches here but there is no approach that works in all situations.
The best solution is to change the synchronous calling code to be asynchronous. With the advent of Microsoft.Bcl.Async and recent releases of Xamarin, asynchronous code is now supported on .NET 4.0 (and higher), Windows Phone 7.1 (and higher), Windows Store, Silverlight 4 (and higher), iOS/MonoTouch, Android/MonoDroid, and portable class libraries for any mix of those platforms.
So there's very little reason these days not to use async.
But if you absolutely need a synchronous API, then the best solution is to make it synchronous all the way. If you do need both asynchronous and synchronous APIs, this will cause code duplication which is unfortunate but it is the best solution at this time.

Categories