I have created a WCF application and in multiple places I am using BeginInvoke to run methods asynchronously. In some places it is asynchronous and in others it runs synchronously. Because the system is quite complicated I was hoping to just throw this high level question out there in the hope that someone knows a reason that BeginInvoke would be forced into synchronous running.
Possible reasons I have considered and believe are not the cause:
I ran out of ThreadPool Threads - I think at peak I use less than 20 threads.
I use locking across those threads stopping their concurrent executions - No synchronization is employed since each call is to a method on a separate WCF ServiceHost
A parent asynchronous method (which definitely is running asynchronously) calls many child BeginInvokes and you can't nest asynchronous invocations - I don't think that is a limitation
The parent asynchronous method is itself part of a WCF ServiceHost that is InstanceContextMode.PerSession and there is some limitation on it nesting asynchronous calls - Again, I don't think so, but FYI
Each child being called is a (different to the parent) WCF ServiceHost whose method I am calling is an instance of the same ServiceType and is run as InstanceContextMode.Single and ConcurrencyMode.Single. - Does that in some way impact the calling routine from running those asynchronously (I don't see why it would, but just in case)
Any ideas/solutions are much appreciated
I'd like to reference this quote regarding Delegate.BeginInvoke from Joe Duffy, in his book Concurrent Programming on Windows:
All delegate types, by convention offer a BeginInvoke and EndInvoke method alongside the ordinary synchronous Invoke method. While this is a nice programming model feature, you should stay away from them wherever possible. The implementation uses remoting infrastructure which imposes a sizable overhead to asynchronous invocation. Queue work to the thread pool directly is often a better approach, though that means you have to co-ordinate the rendezvous logic yourself.
I've done my own tests before, and the overhead can amount to many seconds when using these frequently. Perhaps this is not an answer to your question - because one would be almost impossible I think without seeing and debugging the code. This is rather a suggestion that you should take another look at your approach. Perhaps by queuing work to the ThreadPool directly, or using TPL (Task's / async&await).
If you're still set in finding the problem with the current code instead of revising it for a better strategy, and you still want help, you should find a way to reproduce your symptoms (that it is running synchronously) and provide code proving this.
Related
I'm implementing several small services, each of which uses entity-framework to store certain (but little) data. They also have a fair bit of business-logic so it makes sense to separate them from one another.
I'm certainly aware that async-methods and the async-await pattern itself can solve many problems in regards to performance especially when it comes to any I/O or cpu-intensive operations.
I'm uncertain wether to use the async-methods of entity-framework logic (e.g. SaveChangesAsync or FirstOrDefaultAsync) because I can't find metrics that say "now you do it, and now you don't" besides from "Is it I/O or CPU-Intensive or not?".
What I've found when researching this topic (not limited to this but these are showing the problem):
not using it can lead to your application stopping to respond because the threads (not the ones of the cpu, but virtual threads of the os) can run out because of the in that case blocking i/o calls to the database.
using it bloats your code and decreases performance because of the context-switches at every method. Especially when I apply those to entity-framework calls it means that I have at least three context switches for one call from controller to business-logic to the repository to the database.
What I don't know, and that's what I would like to know from you:
How many virtual os threads are there? Or to be more precise: If I expect my application and server to be able to handle 100 requests to this service within five seconds (and I don't expect them to be more, 100 is already exagerated), should I back away from using async/await there?
What are the precise metrics that I could look at to answer this question for any of my services?
Or should I rather always use async-methods for I/O calls because they are already there and it could always happen that the load-situation on my server changes and there's so much going on that the async-methods would help me a great deal with that?
I'm certainly aware that async-methods and the async-await pattern itself can solve many problems in regards to performance especially when it comes to any I/O or cpu-intensive operations.
Sort of. The primary benefit of asynchronous code is that it frees up threads. UI apps (i.e., desktop/mobile) manifest this benefit in more responsive user interfaces. Services such as the ones you're writing manifest this benefit in better scalability - the performance benefits are only visible when under load. Also, services only receive this benefit from I/O operations; CPU-bound operations require a thread no matter what, so using await Task.Run on service applications doesn't help at all.
not using it can lead to your application stopping to respond because the threads (not the ones of the cpu, but virtual threads of the os) can run out because of the in that case blocking i/o calls to the database.
Yes. More specifically, the thread pool has a limited injection rate, so it can only grow so far so quickly. Asynchrony (freeing up threads) helps your service handle bursty traffic and heavy load. Quote:
Bear in mind that asynchronous code does not replace the thread pool. This isn’t thread pool or asynchronous code; it’s thread pool and asynchronous code. Asynchronous code allows your application to make optimum use of the thread pool. It takes the existing thread pool and turns it up to 11.
Next question:
using it bloats your code and decreases performance because of the context-switches at every method.
The main performance drawback to async is usually memory related. There's additional structures that need to be allocated to keep track of ongoing asynchronous work. In the synchronous world, the thread stack itself has this information.
What I don't know, and that's what I would like to know from you: [when should I use async?]
Generally speaking, you should use async for any new code doing I/O-based operations (including all EF operations). The metrics-based arguments are more about cost/benefit analysis of converting to async - i.e., given an existing old synchronous codebase, at what point is it worth investing the time to convert it to async.
TLDR: Should I use async? YES!
You seem to have fallen for the most common mistake when trying to understand async/await. Async is orthogonal to multi-threading.
To answer your question, when should you the async method?
If currentContext.IsAsync && method.HasAsyncVersion
return UseAsync.Yes;
Else
return UseAsync.No;
That above is the short version.
Async/Await actually solves a few problems
Unblock UI thread
M:N threading
Multithreaded scheduling and synchronization
Interupt/Event based asynchronous scheduling
Given the large number of different use cases for async/await, the "assumptions" you state only apply to certain cases.
For example, context switching, only happens with Multi-Threading. Single-Threaded Interupt based Async actually reduces context switching by reducing blocking times and keeping the OS thread well fed with work.
Finally, your question on OS threads, is fundimentally wrong.
Firstly, OS threads each require creation of a stack (4MB of continous RAM, 100 threads means 400MB of RAM before any work is even done).
Secondly, unless you have 100 physical cores on your PC, your CPUs will have to context switch between each OS thread, resulting in the CPU stalling, whilst it loads that thread. By using M:N threading, you can keep the CPU running, by reducing the number of OS threads and instead using Green Threads (Task in dotnet).
Thirdly, not all "await" results in "async" behavior. Tasks are able to synchronously return, short-circuiting all of the "bloat".
In short, without digging really deep, it is hard to find optimization opportunities by switching from async to sync methods.
I have a .NET 4.5.1 WCF service that handles synchronization from an app that will be used by thousands of users. I currently use Task.WaitAll as shown below and it works fine but I read that this is bad, can cause deadlocks, etc. I believe I tried WhenAll in the past and it didn't work, I don't recall the issues as I'm returning to this for review again just to make sure I'm doing this right. My concern is whether or not the blocking is needed and preferred in this use, a WCF service method hence why the WaitAll appears to work without issue.
I have about a dozen methods that each update an entity in Entity Framework 6 processing the incoming data with existing data and making the necessary changes. Each of these methods can be expensive so I would like to use parallelism mainly to get all methods working at the same time on this powerful 24 core server. Each method returns as Task as wraps its contents in Task.Run. The DoSync method created a new List and adds each of these sync methods to the list. I then call Task.WaitAll(taskList.ToArray()) and all works great.
Is this the right way of doing this? I want to make sure this method will scale well, not cause problems, and work properly in a WCF service scenario.
In high-scale services it is often a good idea to use async IO (which you are not - you use Task.Run). "High scale" is very loosely defined. The benefit of async IO on the server is that it does not block threads. This leads to less memory usage and less context switching. That is all there is to it.
If you do not need these benefits you can use sync IO and blocking all you like. Nothing bad will happen. Understand, that running 10 queries on background threads and waiting for them will temporarily block 11 threads. This might be fine, or not, depending on the number of concurrent operations you expect.
I suggest you do a little research regarding the scalability benefits of async IO so that you better understand when to use it. Remember that there is a cost to going async: Slower development and more concurrency bugs.
Understand, that async IO is different from just using the thread-pool (Task.Run). The thread-pool is not thread-less while async IO does not use any threads at all. Not even "invisible" threads managed by the runtime.
What I often find is: If you have to ask, you don't need it.
Task.WhenAll is the non-blocking equivalent of Task.WaitAll, and without seeing your code I can't think of any reason why it wouldn't work and wouldn't be preferable. But note that Task.WhenAll itself returns a Task which you must await. Did you do that?
I see a lot of people in blog posts and here on SO either avoiding or advising against the usage of the Thread class in recent versions of C# (and I mean of course 4.0+, with the addition of Task & friends). Even before, there were debates about the fact that a plain old thread's functionality can be replaced in many cases by the ThreadPool class.
Also, other specialized mechanisms are further rendering the Thread class less appealing, such as Timers replacing the ugly Thread + Sleep combo, while for GUIs we have BackgroundWorker, etc.
Still, the Thread seems to remain a very familiar concept for some people (myself included), people that, when confronted with a task that involves some kind of parallel execution, jump directly to using the good old Thread class. I've been wondering lately if it's time to amend my ways.
So my question is, are there any cases when it's necessary or useful to use a plain old Thread object instead of one of the above constructs?
The Thread class cannot be made obsolete because obviously it is an implementation detail of all those other patterns you mention.
But that's not really your question; your question is
are there any cases when it's necessary or useful to use a plain old Thread object instead of one of the above constructs?
Sure. In precisely those cases where one of the higher-level constructs does not meet your needs.
My advice is that if you find yourself in a situation where existing higher-abstraction tools do not meet your needs, and you wish to implement a solution using threads, then you should identify the missing abstraction that you really need, and then implement that abstraction using threads, and then use the abstraction.
Threads are a basic building block for certain things (namely parallelism and asynchrony) and thus should not be taken away. However, for most people and most use cases there are more appropriate things to use which you mentioned, such as thread pools (which provide a nice way of handling many small jobs in parallel without overloading the machine by spawning 2000 threads at once), BackgroundWorker (which encapsulates useful events for a single shortlived piece of work).
But just because in many cases those are more appropriate as they shield the programmer from needlessly reinventing the wheel, doing stupid mistakes and the like, that does not mean that the Thread class is obsolete. It is still used by the abstractions named above and you would still need it if you need fine-grained control over threads that is not covered by the more special classes.
In a similar vein, .NET doesn't forbid the use of arrays, despite List<T> being a better fit for many cases where people use arrays. Simply because you may still want to build things that are not covered by the standard lib.
Task and Thread are different abstractions. If you want to model a thread, the Thread class is still the most appropriate choice. E.g. if you need to interact with the current thread, I don't see any better types for this.
However, as you point out .NET has added several dedicated abstractions which are preferable over Thread in many cases.
The Thread class is not obsolete, it is still useful in special circumstances.
Where I work we wrote a 'background processor' as part of a content management system: a Windows service that monitors directories, e-mail addresses and RSS feeds, and every time something new shows up execute a task on it - typically to import the data.
Attempts to use the thread pool for this did not work: it tries to execute too much stuff at the same time and trash the disks, so we implemented our own polling and execution system using directly the Thread class.
The new options make direct use and management of the (expensive) threads less frequent.
people that, when confronted with a task that involves some kind of parallel execution, jump directly to using the good old Thread class.
Which is a very expensive and relatively complex way of doing stuff in parallel.
Note that the expense matters most: You cannot use a full thread to do a small job, it would be counterproductive. The ThreadPool combats the costs, the Task class the complexities (exceptions, waiting and canceling).
To answer the question of "are there any cases when it's necessary or useful to use a plain old Thread object", I'd say a plain old Thread is useful (but not necessary) when you have a long running process that you won't ever interact with from a different thread.
For example, if you're writing an application that subscribes to receive messages from some sort of message queue and you're application is going to do more than just process those messages then it would be useful to use a Thread because the thread will be self-contained (i.e. you aren't waiting on it to get done), and it isn't short-lived. Using the ThreadPool class is more for queuing up a bunch of short-lived work items and allowing the ThreadPool class manage efficiently processing each one as a new Thread is available. Tasks can be used where you would use Thread directly, but in the above scenario I don't think they would buy you much. They help you interact with the thread more easily (which the above scenario doesn't need) and they help determine how many Threads actually should be used for the given set of tasks based on the number of processors you have (which isn't what you want, so you'd tell the Task your thing is LongRunning in which case in the current 4.0 implementation it would simply create a separate non-pooled Thread).
Probably not the answer you were expecting, but I use Thread all the time when coding against the .NET Micro Framework. MF is quite cut down and doesn't include higher level abstractions and the Thread class is super flexible when you need to get the last bit of performance out of a low MHz CPU.
You could compare the Thread class to ADO.NET. It's not the recommended tool for getting the job done, but its not obsolete. Other tools build on top of it to ease the job.
Its not wrong to use the Thread class over other things, especially if those things don't provide a functionality that you need.
It's not definitely obsolete.
The problem with multithreaded apps is that they are very hard to get right (often indeterministic behavior, input, output and also internal state is important), so a programmer should push as much work as possible to framework/tools. Abstract it away. But, the mortal enemy of abstraction is performance.
So my question is, are there any cases when it's necessary or useful
to use a plain old Thread object instead of one of the above
constructs?
I'd go with Threads and locks only if there will be serious performance problems, high performance goals.
I've always used the Thread class when I need to keep count and control over the threads I've spun up. I realize I could use the threadpool to hold all of my outstanding work, but I've never found a good way to keep track of how much work is currently being done or what the status is.
Instead, I create a collection and place the threads in them after I spin them up - the very last thing a thread does is remove itself from the collection. That way, I can always tell how many threads are running, and I can use the collection to ask each what it's doing. If there's a case when I need to kill them all, normally you'd have to set some kind of "Abort" flag in your application, wait for every thread to notice that on its own and self-terminate - in my case, I can walk the collection and issue a Thread.Abort to each one in turn.
In that case, I haven't found a better way that working directly with the Thread class. As Eric Lippert mentioned, the others are just higher-level abstractions, and it's appropriate to work with the lower-level classes when the available high-level implementations don't meet your need. Just as you sometimes need to do Win32 API calls when .NET doesn't address your exact needs, there will always be cases where the Thread class is the best choice despite recent "advancements."
I'm still fairly new to WF so bear with me if I don't get this worded correctly the first time. ;)
If you're doing selects against a well-normalized database, using primary keys, returning single records, in a fairly low volume environment (a few hundred requests per day), does it really make a difference whether you use CodeActivity vs AsyncCodeActivity?
While I've got some additional research to do on hosting and execution, it will be possible, but not probable, for multiple requests to be received at or near the same time. I'm not sure if that will change the answer or not.
Thanks!
Microsoft used non async in their ExecuteSqlQuery activity: http://wf.codeplex.com/releases/view/43585
Async Activities:
"This is useful for custom activities that must perform asynchronous work without holding the workflow scheduler thread and blocking any activities that may be able to run in parallel."
"As a result of going asynchronous, an AsyncCodeActivity may induce an idle point during execution. Due to the volatile nature of asynchronous work, an AsyncCodeActivity always creates a no persist block for the duration of the activity’s execution. This prevents the workflow runtime from persisting the workflow instance in the middle of the asynchronous work, and also prevents the workflow instance from unloading while the asynchronous code is executing."
Source: http://msdn.microsoft.com/en-us/library/ee358731.aspx
Edit: I noticed that only pointed out the disadvantages of using async I would consider the responses of Ron and Tim to make a better decision
In general I strongly encourage activity developers who are doing any kind of I/O to use AsyncCodeActivity and to call the underlying Async APIs whenever possible. Even if the query is short this is always preferrable.
Obviously - it's not going to make a difference unless you're actually calling an Async API inside your activity implementation.
That said, even if it makes a difference it might not make a noticeable difference in many apps. Potential reasons:
The query just runs too fast.
You aren't running multiple queries in parallel. (Running many async operations in parallel is faster than doing them synchronously and thereby sequentially.)
You don't run a large number of workflows in parallel such as would be needed to experience thread contention.
I'm porting a WPF app to WP7, and in the process I've had to refactor all the code that touches the network. The old code used the synchronous methods of the WebRequest object in background threads, but these methods no longer exist in WP7.
The result has been bewildering, and makes me feel like I'm doing something wrong. I've had to litter my views with thread dispatching code - the only alternative to this that I see is to supply the dispatcher to the lower tiers of the app, which would break platform-independence and muddy the boundary with the UI. I've lost the ability to make chained calls over the network from loops, and instead have callbacks invoking themselves. I've lost try/catch error handling and instead have OnSuccess and OnError callbacks everywhere. I'm now always unintentionally running code in background threads that are invoked by callbacks. I fondly remember the days when I was able to return values from methods.
I know continuation-passsing-style is supposed to be great, but I think all of the above has made the code more brittle and less readable, and has made threading issues more complex than they need to be.
Apologies if this question is vague, I'd just like to know if I'm missing some big picture here.
This is a limitation of Silverlight, which requires asynchronous network access (WCF proxy calls, WebClient, WebRequest, etc.). All synchronous network-reliant method calls have been removed from the framework.
To be crass: welcome to asynchronous programming. The only thing you did wrong was not making the calls asynchronous in the first place :)
I'm not 100% clear on the exact reasons MS removed the sync calls from web-dependent objects in Silverlight, but the explanations I hear always center on one or two reasons in some combination:
Browsers are architected on asynchronous network calls. Introducing synchronous calls would cause bad behavior/broken apps/crashes/etc.
If they gave everyone the "easy out" of making synchronous calls, the world would be littered with Silverlight apps that always froze while doing anything on the network, making Silverlight as a platform look bad.
That said - WCF proxies in Silverlight have the behavior that they always perform their callback on the calling thread. This is most often the UI thread, meaning you don't have to do any dispatching. I do not know if WebClient/WebRequest in Silverlight share this behavior.
As for the dispatcher, you could look into using a SynchronizationContext instead. The MVVM reference implementation in the MS Patterns and Practices Prism guidance does this - in the repository (data access class that actually makes calls out to an abstracted external service), they have a SynchronizationContext member that is initialized to System.Threading.SynchronizationContext.Current. This is the UI thread, if the constructor is called on the UI thread (it should be). All results from the service calls are then handled with mySynchronizationContext.Post.
Questions like this seem to behave like buses. You don't see any for ages then two come along almost at the same time. See this answer to a more concrete version of this question asked earlier today.
I have to I agree with you, continuation passing is tricky. A really useful technique is to borrow the C# yield return construct to create a machine that is able to maintain state between asynchronous operations. For a really good explanation see this blog by Jeremy Likness.
Personally I prefer a "less is more" approach so the AsyncOperationService is a very small chunk of code. You'll note that it has a single callback for both success and failure and there no interfaces to implement just a moderate delegate Action<Action<Exception>> which is typed as AsyncOperation to make it more convenient.
The basic steps to coding against this are:-
Code as if synchronous execution were possible
Create methods that return an AsyncOperation fpr only the smallest part that has to be asynchronous. Usually some WebRequest or WCF call but note just enough to get past the async bit, see me other answer for a good example.
Convert the synchronous "psuedo-code" to yeild these AsyncOperations and change the calling code to "Run" the resulting enumerable.
The final code looks quite similar to the synchronous code you might be more familar with.
As to accidentally running things on a background thread, that last answer included this useful AsyncOperation:-
public static AsyncOperation SwitchToUIThread()
{
return (completed => Deployment.Current.Dispatcher.BeginInvoke(() => completed(null)));
}
You can use that as the final yield in the run to ensure that code executing in the completed callback is executing on the UI thread. Its also useful to "flip" what is apparently synchronous code to be running on the UI thread when necessary.