I came across this C# code sample on MSDN that shows how to use a delegate to wrap a callback method for an asynchronous DNS lookup operation:
http://msdn.microsoft.com/en-us/library/ms228972.aspx
As you can see from the code, a counter is incremented by the initiating method for each request and decremented once each time the callback method is executed. The initiating method sits in a loop until the counter reaches zero, keeping the UI updated as it waits.
What I don't see in the sample is a robust method for timing out the initiating method if the process takes too long. My questions are:
What is a good way to institute a robust time-out mechanism in this example? Is it necessary to make any calls to clean up any pending DNS lookups if the decision is made to abort the entire operation? If anyone knows of a good resource or example that demonstrates robust time-out handling in this call/callback pair scenario, I'd like to know about it.
Is this scenario better served by the async-await pattern added since VS2012?
Are there any tips or domain specific concerns related to executing in a Windows Phone context?
The Dns class specifically does not have a facility for cancelling requests. Some APIs do (generated WCF proxies) others don't (Dns and many others).
As for your questions (answering in the context of Timeout):
There are multiple ways of making the uber-request (the async request that wraps the multiple DNS requests) time out. One can simply check for time passed where the code calls UpdateUserInterface. That would be the simplest way in the context of this sample (though will probably fall short for most real-world scenarios, unless you are willing to take up a thread to do it). As for clean-up - if you mean memory clean up after the operation completes, that will happen (unless there's a library bug). If you instead mean clean up to conserve resources, truth is most people don't bother in most cases. The cost and added complexity (coupled with the fact that it's a "less travelled path" meaning less testing) means that calls that should be aborted are often just left alone and complete on their own sweet time. One just needs to make sure the continue code does not do anything bad.
Doing a non-blocking timeout with Tasks (await/async) is actually very compelling, if you have access to it.. All you need to do is the following (pseudo code):
async Task> GetAddresses(IEnumerable hosts, int timeoutms)
{
List> tasks = new List>();
foreach (var host in hosts)
{
tasks.Add(GetEntryAsync(host);
}
var wait = Task.Delay(timeoutms); // create a task that will fire when time passes.
var any = await Task.WhenAny(
wait,
Task.WhenAll(tasks)); // Wait for EITHER timeout or ALL the dns reqs.
if (any == wait)
{
throw new MyTimeoutException();
}
return tasks.Select(t => t.Result).ToArray(); // Should also check for exceptions here.
}
No tips really.. A good portion of async operations are not even cancellable and, at least in my opionion, are not worth re-writing just to get cancellation semantics.
Related
I've been struggling for about some days now on checking where to do await and where not to.
I have a Repository class which fetches the data from database.
using EntityFramework the code would be something like this:
public async Task<List<Object>> GetAsync()
{
return await context.Set<Object>().ToListAsync();
}
and the consumer:
var data = await GetAsync();
and on top level I'm awaiting this method too.
should I use await on only one of these methods?
Is it a performance penalty on using resources and creates new thread each time you do await?
I have checked the questions listed in the comments and they do not reffer to the performance issues and just say that you can do it. I wanted the best practice and the reason why to / not to do so.
I'd like to add to that.
There are some async methods where there is no need to use async/await keywords. It is important to detect this kind of misuse because adding the async modifier comes at a price.
E.G. You don't need async/await keywords in your example.
public Task<List<Object>> GetAsync()
{
return context.Set<Object>().ToListAsync();
}
And then:
var data = await GetAsync();
Will be just fine. In this case, you are returning the Task<List<Object>> and then you are awaiting that in the place you directly work with objects.
I recommend installing async await helper
Let me get the essence of your question first, confusion is related to where to use the Async in the complete chain of calls and where not and how to assess the performance impact of usage, as it may lead to creation of more threads. If the synopsis goes beyond this add details to the comments, i till try to answer them too.
Let's divide and tackle each of them one by one.
Where to use the Async in the chain of calls and where not ?
Here as you are using Entity Framework to access a database, I can safely assume you are using IO based Asynchronous processing, which is the most prominent use case for Async processing across languages and frameworks, use cases for CPU based Asynchronous processing are relatively limited (will explain them too)
Async is a Scalability feature especially for IO processing instead of performance feature, in simple words via Async processing you can ensure that a hosted server can cater to many times more calls for IO processing, since calls are not blocking and they just hand over the processing request over the network, while process thread goes back to the pool ready to serve another request, complete handing over process few milliseconds
When the processing is complete, software thread need to just receive them and pass back to the client, again few millisecond, mostly its around < 1 ms, if its a pure pass through no logic call for IO
What are Benefits
Imagine instead making Synchronous call for IO to a database, where each thread involve will just wait for result to arrive, which may go in few seconds, impact will be highly negative, in general based on thread pool size, you may server 25 - 50 request at most and they too will reply on number of cores available to process, will continuously spin wasting resources, while they are idle and waiting for the response
If you make synchronous call there's no way to serve 1000+ requests in same setup and I am extremely conservative Async can actually have huge Scalability impact, where for light weight calls it may serve millions requests with ease from a single hosted process
After the background, where to use the Async in complete chain
Everywhere, feasible from begin to end, from entry point to actual exit point making IO call, since that's the actual call relieving the pool thread, as it dispatch the call over network
Do Remember though, await at a given point doesn't allow further code to process ins same method, even if it relieve the thread, so its better that if there are multiple independent calls, they are aggregated using Task.WhenAll and the representative task is awaited, it will return when all of them finish success / error, what ever may be the state
If the Async is broken in between by using something like Task.Wait or Task.Result, it will not remain pure Async call and will block the calling thread pool thread
How can Async be further enhanced ?
In Pure library calls, where Async is initiated by the Thread pool and dispatching thread can be different from receiving one and call doesn't need to reenter same context, you shall use ConfigureAwait(false), which means it will not wait to re-enter the original context and is a performance boost
Like await it makes sense to use ConfigureAwait(false) across the chain, entry to the end. This is valid only for libraries which reply extensively on thread pools
Is there a Thread created
Best read this, Stephen Cleary - There's no thread
A genuine IO async call will use Hardware based concurrency to process, will not block
the Software Threads
Variations
CPU based Asychronous processing, where you take things in background, since current thread needs to be responsive, mostly in case of Ui like WPF
Use cases
All kinds of systems especially non MS frameworks, like node js, have Async processing as underlying principle and database server cluster on receiving end is tuned to receive millions of calls and process them
B2C calls, its expected that each request is light weight with limited Payload
Edit 1:
Just in this specific case as listed here, ToListAsyncis by default Asynchronous, so you can skip async await in that case as listed in variopus comments, though do review Stepehen Cleay's article in general that may not be a very good strategy, since gains are minimal and negative impact for incorrect usage can be high
Probably this question has already been made, but I never found a definitive answer. Let's say that I have a Web API 2.0 Application hosted on IIS. I think I understand that best practice (to prevent deadlocks on client) is always use async methods from the GUI event to the HttpClient calls. And this is good and it works. But what is the best practice in case I had client application that does not have a GUI (e.g. Window Service, Console Application) but only synchronous methods from which to make the call? In this case, I use the following logic:
void MySyncMethodOnMyWindowServiceApp()
{
list = GetDataAsync().Result().ToObject<List<MyClass>>();
}
async Task<Jarray> GetDataAsync()
{
list = await Client.GetAsync(<...>).ConfigureAwait(false);
return await response.Content.ReadAsAsync<JArray>().ConfigureAwait(false);
}
But unfortunately this can still cause deadlocks on client that occur at random times on random machines.
The client app stops at this point and never returns:
list = await Client.GetAsync(<...>).ConfigureAwait(false);
If it's something that can be run in the background and isn't forced to be synchronous, try wrapping the code (that calls the async method) in a Task.Run(). I'm not sure that'll solve a "deadlock" problem (if it's something out of sync, that's another issue), but if you want to benefit from async/await, if you don't have async all the way down, I'm not sure there's a benefit unless you run it in a background thread. I had a case where adding Task.Run() in a few places (in my case, from an MVC controller which I changed to be async) and calling async methods not only improved performance slightly, but it improved reliability (not sure that it was a "deadlock" but seemed like something similar) under heavier load.
You will find that using Task.Run() is regarded by some as a bad way to do it, but I really couldn't see a better way to do it in my situation, and it really did seem to be an improvement. Perhaps this is one of those things where there's the ideal way to do it vs. the way to make it work in the imperfect situation that you're in. :-)
[Updated due to requests for code]
So, as someone else posted, you should do "async all the way down". In my case, my data wasn't async, but my UI was. So, I went async down as far as I could, then I wrapped my data calls with Task.Run in such as way that it made sense. That's the trick, I think, to figure out if it makes sense that things can run in parallel, otherwise, you're just being synchronous (if you use async and immediately resolve it, forcing it to wait for the answer). I had a number of reads that I could perform in parallel.
In the above example, I think you have to async up as far as makes sense, and then at some point, determine where you can spin off a t hread and perform the operation independent of the other code. Let's say you have an operation that saves data, but you don't really need to wait for a response -- you're saving it and you're done. The only thing you might have to watch out for is not to close the program without waiting for that thread/task to finish. Where it makes sense in your code is up to you.
Syntax is pretty easy. I took existing code, changed the controller to an async returning a Task of my class that was formerly being returned.
var myTask = Task.Run(() =>
{
//...some code that can run independently.... In my case, loading data
});
// ...other code that can run at the same time as the above....
await Task.WhenAll(myTask, otherTask);
//..or...
await myTask;
//At this point, the result is available from the task
myDataValue = myTask.Result;
See MSDN for probably better examples:
https://msdn.microsoft.com/en-us/library/hh195051(v=vs.110).aspx
[Update 2, more relevant for the original question]
Let's say that your data read is an async method.
private async Task<MyClass> Read()
You can call it, save the task, and await on it when ready:
var runTask = Read();
//... do other code that can run in parallel
await runTask;
So, for this purpose, calling async code, which is what the original poster is requesting, I don't think you need Task.Run(), although I don't think you can use "await" unless you're an async method -- you'll need an alternate syntax for Wait.
The trick is that without having some code to run in parallel, there's little point in it, so thinking about multi-threading is still the point.
Using Task<T>.Result is the equivalent of Wait which will perform a synchronous block on the thread. Having async methods on the WebApi and then having all the callers synchronously blocking them effectively makes the WebApi method synchronous. Under load you will deadlock if the number of simultaneous Waits exceeds the server/app thread pool.
So remember the rule of thumb "async all the way down". You want the long running task (getting a collection of List) to be async. If the calling method must be sync you want to make that conversion from async to sync (using either Result or Wait) as close to the "ground" as possible. Keep they long running process async and have the sync portion as short as possible. That will greatly reduce the length of time that threads are blocked.
So for example you can do something like this.
void MySyncMethodOnMyWindowServiceApp()
{
List<MyClass> myClasses = GetMyClassCollectionAsync().Result;
}
Task<List<MyClass>> GetMyListCollectionAsync()
{
var data = await GetDataAsync(); // <- long running call to remote WebApi?
return data.ToObject<List<MyClass>>();
}
The key part is the long running task remains async and not blocked because await is used.
Also don't confuse the responsiveness with scalability. Both are valid reasons for async. Yes responsiveness is a reason for using async (to avoid blocking on the UI thread). You are correct this wouldn't apply to a back end service however this isn't why async is used on a WebApi. The WebApi is also a non GUI back end process. If the only advantage of async code was responsiveness of the UI layer then WebApi would be sync code from start to finish. The other reason for using async is scalability (avoiding deadlocks) and this is the reason why WebApi calls are plumbed async. Keeping the long running processes async helps IIS make more efficient use of a limited number of threads. By default there are only 12 worker threads per core. This can be raised but that isn't a magic bullet either as threads are relatively expensive (about 1MB overhead per thread). await allows you to do more with less. More concurrent long running processes on less threads before a deadlock occurs.
The problem you are having with deadlocks must stem from something else. Your use of ConfigureAwait(false) prevents deadlocks here. Solve the bug and you are fine.
See Should we switch to use async I/O by default? to which the answer is "no". You should decide on a case by case basis and choose async when the benefits outweigh the costs. It is important to understand that async IO has a productivity cost associated with it. In non-GUI scenarios only a few targeted scenarios derive any benefit at all from async IO. The benefits can be enormous, though, but only in those cases.
Here's another helpful post: https://stackoverflow.com/a/25087273/122718
All -
More of approach question. I have a web service whom I need to performance test from a client machine. So in essence am writing a quick WPF multi-threaded app (which has a gauge/speedometer in it) to visually indicate the request/response time. Event-driven - so when i click a button - the app will start making requests. I am only concerned about how much time it took for the request/response and not the resposne value itself (for now).
Here is my thought-process currently:
1) I need to create as many threads as I can (which my client machine can handle) and measure the performance. 2 options I can think off - creating a new Thread mechanism (so I have full control over the thread) or using a backgroundworker mechanism (so I can pass that value from background processing back to the UI). Assumption - will have to loop through the thread creation code - so can keep creating multiple threads for both approaches.
2) Dont need any progress reporting and hence that is not criteria for choosing a multi-threaded approach
3) Do need a callback method - because that should pass back the value (time taken for a request/response to the webservice)
4) When I am updating a variable with an value - will leverage any one of the synchronization methods available.
5) Havent really used the Task API from 4.0 framework - but is that something which I should consider.
Does the above line of approach look good - or am I missing something?
Really appreciate any help !!!
A lot of people have recommended Tasks, which is a good idea, I think. I don't mind using bare threads either myself, and as far as which one you should use, either will do fine. The main thing to be wary of is the exception-handling behavior, which varies between the two. If you have an unhandled exception in a typical thread, it will bring down your process, so you probably want to have something like this (only not oversimplified ;)):
int errorCount = 0;
void Run()
{
Thread t = new Thread(delegate()
{
try
{
// your code
}
catch (Exception )
{
Interlocked.Increment(ref errorCount);
}
});
}
On the other hand, tasks have a more controlled way of dealing with errors - they throw them wrapped in an AggregateException when you call the Task.Wait function.
I'm thinking about exceptions for your case in particular because I assume you would end up with timeout errors during a stress test.
Parallel.ForEach is probably worth looking at as well, but honestly, since you're trying a stress test and not a real-world scenario, I might avoid it to be sure how many threads I'm running at once - I believe PLINQ stuff does some load balancing on the client side.
Callback methods are easy for all these methods. I'd just use a method call, rather than passing a callback, but that's an implementation detail.
I'd stay away from the BackgroundWorker, because it's really meant for UI-related asynchronous operations, and is maybe a little bit bloated in this particular context.
I'm still fairly new to WF so bear with me if I don't get this worded correctly the first time. ;)
If you're doing selects against a well-normalized database, using primary keys, returning single records, in a fairly low volume environment (a few hundred requests per day), does it really make a difference whether you use CodeActivity vs AsyncCodeActivity?
While I've got some additional research to do on hosting and execution, it will be possible, but not probable, for multiple requests to be received at or near the same time. I'm not sure if that will change the answer or not.
Thanks!
Microsoft used non async in their ExecuteSqlQuery activity: http://wf.codeplex.com/releases/view/43585
Async Activities:
"This is useful for custom activities that must perform asynchronous work without holding the workflow scheduler thread and blocking any activities that may be able to run in parallel."
"As a result of going asynchronous, an AsyncCodeActivity may induce an idle point during execution. Due to the volatile nature of asynchronous work, an AsyncCodeActivity always creates a no persist block for the duration of the activity’s execution. This prevents the workflow runtime from persisting the workflow instance in the middle of the asynchronous work, and also prevents the workflow instance from unloading while the asynchronous code is executing."
Source: http://msdn.microsoft.com/en-us/library/ee358731.aspx
Edit: I noticed that only pointed out the disadvantages of using async I would consider the responses of Ron and Tim to make a better decision
In general I strongly encourage activity developers who are doing any kind of I/O to use AsyncCodeActivity and to call the underlying Async APIs whenever possible. Even if the query is short this is always preferrable.
Obviously - it's not going to make a difference unless you're actually calling an Async API inside your activity implementation.
That said, even if it makes a difference it might not make a noticeable difference in many apps. Potential reasons:
The query just runs too fast.
You aren't running multiple queries in parallel. (Running many async operations in parallel is faster than doing them synchronously and thereby sequentially.)
You don't run a large number of workflows in parallel such as would be needed to experience thread contention.
Is it possible to purge a ThreadPool?
Remove items from the ThreadPool?
Anything like that?
ThreadPool.QueueUserWorkItem(GetDataThread);
RegisteredWaitHandle Handle = ThreadPool.RegisterWaitForSingleObject(CompletedEvent, WaitProc, null, 10000, true);
Any thoughts?
I recommend using the Task class (added in .NET 4.0) if you need this kind of behaviour. It supports cancellation, and you can have any number of tasks listening to the same cancellation token, which enables you to cancel them all with a single method call.
Updated (non-4.0 solution):
You really only have two choices. One: implement your own event demultiplexer (this is far more complex than it appears, due to the 64-handle wait limitation); I can't recommend this - I had to do it once (in unmanaged code), and it was hideous.
That leaves the second choice: Have a signal to cancel the tasks. Naturally, RegisteredWaitHandle.Unregister can cancel the RWFSO part. The QUWI is more complex, but can be done by making the action aware of a "token" value. When the action executes, it first checks the token value against its stored token value; if they are different, then it shouldn't do anything.
One major thing to consider is race conditions. Just keep in mind that there is a race condition between cancelling an action and the ThreadPool executing it, so it is possible to see actions running after cancellation.
I have a blog post on this concept, which I call "asynchronous callback contexts". The CallbackContext type mentioned in the blog post is available in the Nito.Async library.
There's no interface for removing a queued item. However, nothing stops you from "poisoning" the delegate so that it returns immediately.
edit
Based on what Paul said, I'm thinking you might also want to consider a pipelined architecture, where you have a fixed number of threads reading from a blocking queue (like .NET 4.0's BlockingCollection on a ConcurrentQueue). This way, if you want to cancel items, you can just access the queue yourself.
Having said that, Stephen's advice about Task is likely better, in that it gives you all the control you would realistically want, without all the hard work that rolling your own pipelines involves. I mention this only for completion.
The ThreadPool exists to help you manage your threads. You should not have to worry about purging it at all since it will make the best performance decisions on your behalf.
If you think you need tighter control over your threads then you could consider creating your own thread management class (similar to ThreadPool) but it would take a lot of work to match and exceed the functionality that ThreadPool has built in.
Take a look here at some of the ThreadPool optimizations and the ideas behind it.
For my second point, I found an article on Code Project that implements a "Cancelable Threadpool", probably for some of your own similar reasons. It would be a good place to start looking if you're going to write your own.