I am using asynchronous methods in some of my project and I like it since it allows my application to be a lot more scalable. However, I am wondering how asynchronous methods really work in background? How .NET (or Windows?) knows that a call is completed? Depending on the number of asynchronous calls I made, I can see that new threads are created (not always though…). Why?
In addition, I would like to monitor how long a request take to complete. To test the concept, I wrote the following code which calls asynchronously a Web service and immediately after starts a stopwatch.
for (int i = 0; i < 10000; i++)
{
myWebService.BeginMyMethod(new MyRequest(), result, new AsyncCallback(callback), i);
stopWatches[i].Start();
}
// Call back stop the stopwatch after calling EndMyMethod
This doesn’t work since all the requests (10000) have the same begin time and the duration will go up linearly (call 0 = duration 1, call 1 = duration 2, etc.). How could I monitor the real duration of a call with asynchronous method (from the moment the request is really executed until the end)?
UPDATE: Does an asynchronous method block the flow? I understand that it uses the .NET ThreadPool but how an IAsyncResult know that a call is completed and it's time to call the CallBack method?
The code is the railroad and the thread is the train. As train goes on railroad it executes the code.
BeginMyMethod is executed by the main thread. If you look inside the BeginMyMethod it simply adds a delegate of MyMethod to the ThreadPool's queue. The actual MyMethod is executed by one of the trains of the train pool. The completion routine that is called when MyMethod is done is executed by the same thread that executed the MyMethod, not by your main thread that runs the rest of the code. While a thread pool thread is busy executing MyMethod, the main thread can either ride some other portion of the railroad system (execute some other code), or simply sleep, waiting until certain semaphore is lit up.
Therefore there's no such thing as IAsyncResult "knowing" when to call the completion routine, instead, completion routine is simply a delegate called by the thread pool's thread right after it's done executing MyMethod.
I hope you don't mind the somewhat childish train analogy, I know it helped me more than once when explaining multithreading to people.
The crux of it is that calling Begin queues up a request to execute your method. The method is actually executed on the ThreadPool, which is a set of worker threads provided by the runtime.
The threadpool is a fixed set of threads to crunch through async tasks as they get put into the queue. That explains why you see the execution time take longer and longer - your methods may each execute in approximately the same time, but they don't start until all previous methods in the queue have been executed.
To monitor the length of time it takes to actually execute the async method, you have to start and stop the timer at the beginning and end of your method.
Here's the docs for the ThreadPool class, and an article about async methods that do a better job of explaining what's going on.
Asynchronous methods work by using the .NET ThreadPool. They will push the work onto a ThreadPool thread (potentially creating one if needed, but usually just reusing one) in order to work in the background.
In your case, you can do what you're doing, however, realize that the ThreadPool has a limited number of threads with which it will work. You're going to spawn your work onto background threads, and the first will run immediately, but after a while, they will queue up, and not work until "tasks" run before completely. This will give the appearance of the threads taking longer and longer.
However, your stopwatch criteria is somewhat flawed. You should measure the total time it takes to complete N tasks, not N times to complete one task. This will be a much more useful metric.
Its possible that a majority of the execution time happens before BeginMyMethod(). In that case your measurement will be too low. In fact, depending on the API, BeginMyMethod() may call the callback before leaving the stack itself. Moving up the call to StopWatch.Start() should help then.
Related
I thought that they were basically the same thing — writing programs that split tasks between processors (on machines that have 2+ processors). Then I'm reading this, which says:
Async methods are intended to be non-blocking operations. An await
expression in an async method doesn’t block the current thread while
the awaited task is running. Instead, the expression signs up the rest
of the method as a continuation and returns control to the caller of
the async method.
The async and await keywords don't cause additional threads to be
created. Async methods don't require multithreading because an async
method doesn't run on its own thread. The method runs on the current
synchronization context and uses time on the thread only when the
method is active. You can use Task.Run to move CPU-bound work to a
background thread, but a background thread doesn't help with a process
that's just waiting for results to become available.
and I'm wondering whether someone can translate that to English for me. It seems to draw a distinction between asynchronicity (is that a word?) and threading and imply that you can have a program that has asynchronous tasks but no multithreading.
Now I understand the idea of asynchronous tasks such as the example on pg. 467 of Jon Skeet's C# In Depth, Third Edition
async void DisplayWebsiteLength ( object sender, EventArgs e )
{
label.Text = "Fetching ...";
using ( HttpClient client = new HttpClient() )
{
Task<string> task = client.GetStringAsync("http://csharpindepth.com");
string text = await task;
label.Text = text.Length.ToString();
}
}
The async keyword means "This function, whenever it is called, will not be called in a context in which its completion is required for everything after its call to be called."
In other words, writing it in the middle of some task
int x = 5;
DisplayWebsiteLength();
double y = Math.Pow((double)x,2000.0);
, since DisplayWebsiteLength() has nothing to do with x or y, will cause DisplayWebsiteLength() to be executed "in the background", like
processor 1 | processor 2
-------------------------------------------------------------------
int x = 5; | DisplayWebsiteLength()
double y = Math.Pow((double)x,2000.0); |
Obviously that's a stupid example, but am I correct or am I totally confused or what?
(Also, I'm confused about why sender and e aren't ever used in the body of the above function.)
Your misunderstanding is extremely common. Many people are taught that multithreading and asynchrony are the same thing, but they are not.
An analogy usually helps. You are cooking in a restaurant. An order comes in for eggs and toast.
Synchronous: you cook the eggs, then you cook the toast.
Asynchronous, single threaded: you start the eggs cooking and set a timer. You start the toast cooking, and set a timer. While they are both cooking, you clean the kitchen. When the timers go off you take the eggs off the heat and the toast out of the toaster and serve them.
Asynchronous, multithreaded: you hire two more cooks, one to cook eggs and one to cook toast. Now you have the problem of coordinating the cooks so that they do not conflict with each other in the kitchen when sharing resources. And you have to pay them.
Now does it make sense that multithreading is only one kind of asynchrony? Threading is about workers; asynchrony is about tasks. In multithreaded workflows you assign tasks to workers. In asynchronous single-threaded workflows you have a graph of tasks where some tasks depend on the results of others; as each task completes it invokes the code that schedules the next task that can run, given the results of the just-completed task. But you (hopefully) only need one worker to perform all the tasks, not one worker per task.
It will help to realize that many tasks are not processor-bound. For processor-bound tasks it makes sense to hire as many workers (threads) as there are processors, assign one task to each worker, assign one processor to each worker, and have each processor do the job of nothing else but computing the result as quickly as possible. But for tasks that are not waiting on a processor, you don't need to assign a worker at all. You just wait for the message to arrive that the result is available and do something else while you're waiting. When that message arrives then you can schedule the continuation of the completed task as the next thing on your to-do list to check off.
So let's look at Jon's example in more detail. What happens?
Someone invokes DisplayWebSiteLength. Who? We don't care.
It sets a label, creates a client, and asks the client to fetch something. The client returns an object representing the task of fetching something. That task is in progress.
Is it in progress on another thread? Probably not. Read Stephen's article on why there is no thread.
Now we await the task. What happens? We check to see if the task has completed between the time we created it and we awaited it. If yes, then we fetch the result and keep running. Let's suppose it has not completed. We sign up the remainder of this method as the continuation of that task and return.
Now control has returned to the caller. What does it do? Whatever it wants.
Now suppose the task completes. How did it do that? Maybe it was running on another thread, or maybe the caller that we just returned to allowed it to run to completion on the current thread. Regardless, we now have a completed task.
The completed task asks the correct thread -- again, likely the only thread -- to run the continuation of the task.
Control passes immediately back into the method we just left at the point of the await. Now there is a result available so we can assign text and run the rest of the method.
It's just like in my analogy. Someone asks you for a document. You send away in the mail for the document, and keep on doing other work. When it arrives in the mail you are signalled, and when you feel like it, you do the rest of the workflow -- open the envelope, pay the delivery fees, whatever. You don't need to hire another worker to do all that for you.
In-browser Javascript is a great example of an asynchronous program that has no multithreading.
You don't have to worry about multiple pieces of code touching the same objects at the same time: each function will finish running before any other javascript is allowed to run on the page. (Update: Since this was written, JavaScript has added async functions and generator functions. These functions do not always run to completion before any other javascript is executed: whenever they reach a yield or await keyword, they yield execution to other javascript, and can continue execution later, similar to C#'s async methods.)
However, when doing something like an AJAX request, no code is running at all, so other javascript can respond to things like click events until that request comes back and invokes the callback associated with it. If one of these other event handlers is still running when the AJAX request gets back, its handler won't be called until they're done. There's only one JavaScript "thread" running, even though it's possible for you to effectively pause the thing you were doing until you have the information you need.
In C# applications, the same thing happens any time you're dealing with UI elements--you're only allowed to interact with UI elements when you're on the UI thread. If the user clicked a button, and you wanted to respond by reading a large file from the disk, an inexperienced programmer might make the mistake of reading the file within the click event handler itself, which would cause the application to "freeze" until the file finished loading because it's not allowed to respond to any more clicking, hovering, or any other UI-related events until that thread is freed.
One option programmers might use to avoid this problem is to create a new thread to load the file, and then tell that thread's code that when the file is loaded it needs to run the remaining code on the UI thread again so it can update UI elements based on what it found in the file. Until recently, this approach was very popular because it was what the C# libraries and language made easy, but it's fundamentally more complicated than it has to be.
If you think about what the CPU is doing when it reads a file at the level of the hardware and Operating System, it's basically issuing an instruction to read pieces of data from the disk into memory, and to hit the operating system with an "interrupt" when the read is complete. In other words, reading from disk (or any I/O really) is an inherently asynchronous operation. The concept of a thread waiting for that I/O to complete is an abstraction that the library developers created to make it easier to program against. It's not necessary.
Now, most I/O operations in .NET have a corresponding ...Async() method you can invoke, which returns a Task almost immediately. You can add callbacks to this Task to specify code that you want to have run when the asynchronous operation completes. You can also specify which thread you want that code to run on, and you can provide a token which the asynchronous operation can check from time to time to see if you decided to cancel the asynchronous task, giving it the opportunity to stop its work quickly and gracefully.
Until the async/await keywords were added, C# was much more obvious about how callback code gets invoked, because those callbacks were in the form of delegates that you associated with the task. In order to still give you the benefit of using the ...Async() operation, while avoiding complexity in code, async/await abstracts away the creation of those delegates. But they're still there in the compiled code.
So you can have your UI event handler await an I/O operation, freeing up the UI thread to do other things, and more-or-less automatically returning to the UI thread once you've finished reading the file--without ever having to create a new thread.
Let's say I properly use async-await, like
await client.GetStringAsync("http://stackoverflow.com");
I understand that the thread that invokes the await becomes "free", that is, something further up the call chain isn't stuck executing some loop equivalent to
bool done = false;
string html = null;
for(; !done; done = GetStringIfAvailable(ref html));
which is what it would be doing if I called the synchronous version of GetStringAsync (probably called GetString by convention).
However, here's where I get confused. Even if the calling thread or any other thread in application's pool of available threads isn't blocked with such a loop, then something is, because, as I understand, at a low level there is always polling going on. So, instead of lowering the total amount of work, I'm simply pushing work to something "beneath" my application's threads ... or something like that.
Can someone clear this up for me?
No.
The compiler will convert methods that use async / await in to state machines that can be broken up in to multiple steps. Once an await is hit, the state of the method is stored and execution is "offloaded" back to the thread that called it. If the task is waiting on things like disk IO, the OS kernel will end up relying on physical CPU interrupts to let the kernel know when to signal the application to resume processing. The state of the pending method is loaded, and queued up on an available thread (the same thread that hit the await if ConfigureAwait is true, or any free thread if false) (This last part isn't exactly right, please see Scott Chamberlain's comments below.). Think of it like an event, where the application asks the hardware to "ping" it once the work is done, while the application gets back to doing whatever it was doing before.
There are some cases where a new thread is spun up to do the work, such as Task.Run which does the work on a ThreadPool thread, but no thread is blocking while awaiting it to complete.
It is important to keep in mind that asynchronous operations using async/ await, are all about pausing, storing, retrieving, and resuming that state-machine. It doesn't really care about what happens inside the Task, what happens there, and how it happens, isn't directly related to async / await.
I was very confused by async / await too, until I really understood how the method is converted to a state-machine. Reading up on exactly what your async methods get converted to by the compiler might help.
You're pushing it off onto the operating system--which will run some other thread if it can rather than simply wait. It only ends up in a busy-wait when it can't find any thread that wants to run.
What is an asynchronous method. I think I know, but I keep confusing it with parallelism. I'm not sure what the difference between an asynchronous method is and what parallelism is.
Also what is difference between using threading classes and asynchronous classes?
EDIT
Some code demonstrating the difference between async, threading and parallelism would be useful.
What are asynchronous methods?
Asynchronous methods come into the discussion when we are talking about potentially lengthy operations. Typically we need such an operation to complete in order to meaningfully continue program execution, but we don't want to "pause" until the operation completes (because pausing might mean e.g. that the UI stops responding, which is clearly undesirable).
An asynchronous method is one that we call to start the lengthy operation. The method should do what it needs to start the operation and return "very quickly" so that there are no processing delays.
Async methods typically return a token that the caller can use to query if the operation has completed yet and what its result was. In some cases they take a callback (delegate) as an argument; when the operation is complete the callback is invoked to signal the caller that their results are ready and pass them back. This is a commonly used callback signature, although of course in general the callback can look like anything.
So who does actually run the lengthy operation?
I said above that an async method starts a length operation, but what does "start" mean in this context? Since the method returns immediately, where is the actual work being done?
In the general case an execution thread needs to keep watch over the process. Since it's not the thread that called the async method that pauses, who does? The answer is, a thread picked for this purpose from the managed thread pool.
What's the connection with threading?
In this context my interpretation of "threading" is simply that you explicitly spin up a thread of your own and delegate it to execute the task in question synchronously. This thread will block for a time and presumably will signal your "main" thread (which is free to continue executing) when the operation is complete.
This designated worker thread might be pulled out of the thread pool (beware: doing very lengthy processing in a thread pool thread is not recommended!) or it might be one that you started just for this purpose.
First off, what is a method and what is a thread? A method is a unit of work that either (1) performs a useful side effect, like writing to a file, or (2) computes a result, like making a bitmap of a fractal. A thread is a worker that performs that work.
A method is synchronous if in order to use the method -- to get the side effect or the result -- your thread must do nothing else from the point where you request the work to be done until the point where it is finished.
A method is asynchronous if your thread tells the method that it needs the work to be done, and the method says "OK, I'll do that and I'll call you when it is finished".
Usually the way an asynchronous method does that is it makes another worker -- it grabs a thread from the pool. This is particularly true if the method needs to make heavy use of a CPU. But not always; there is no requirement that an asynchronous method spins up another thread.
Does that make sense?
Say you need to clean the house, cook the dinner and put the children to bed.
Synchronous:
You clean the house, then cook dinner, then put the children to bed.
Parallel:
You hire 3 people to clean the house, cook dinner and put the children to bed. But you don't trust them so keep a supervisory role, looking over them and waiting for them to finish. Only when they've all finished do they get paid.
Asynchronous:
You one child to clean the house and another to cook dinner. When each have finished their chores they put themselves to bed, while you put your feet up with a glass of wine in front of the tv.
First you got to understand that if you want parallelism all the structure need to be parallel, I mean that if you have an asynchronous method you need a asynchronous call.
In webservices or web stuff, asynchronous methods can be (just one of the many ways) called with AJAX which is asynchronous. In one method you can have multiple threads, this is the key difference between async methods and multiplie threads.
And the main: the difference between a standard method and a async method is that if you make 2 calls to a standard method at the same time to the same controller with a asynchronous caller (like AJAX) the second call will just begin when the first call has already completed, if the methods that you called were asynchronous both the calls will begin at the same time, with multiple-cores servers it can achiev twice (2 calls) the standard speed.
The speed of the parallelism is measured by this law.
Sometimes when Delegate.BeginInvoke is invoked, it takes more than one second to execute the delegate method.
What could be the reasons for the delay? I get this issue 1 or 2 times a day in an application which runs continuosly.
Please help me.
Thanks!
The thread pool manager makes sure that only as many threads are allowed to execute as you have CPU cores. As soon as one completes, another one that's waiting in the queue is allowed to execute.
Twice a second, it re-evaluates what's going on with the running threads. If they don't complete, it assumes they are blocked and allows another waiting thread to run. On the typical two-core CPU, you'll get two threads running right away, the 3rd thread starts after one second, the 4th thread after 1.5 second, etcetera.
Well, there's your second. The Q&D fix is to use ThreadPool.SetMinThreads(), but that's the sledgehammer solution. The real issue is that your program is using thread pool threads for long-running tasks. Either because they execute a lot of code or because they block on some kind of I/O request. The latter being the more common case.
The way to solve it is to not use a thread pool thread for such a blocking thread but use the Thread class instead. Don't do this if the threads are actually burning CPU cycles, you'll slow everything down. Easy to tell, you'll see 100% cpu load in Taskmgr.exe
Since you're using Delegate.BeginInvoke then you're, indirectly, using the ThreadPool. The ThreadPool recycles completed threads and allows them to be reused without going through the expense of constructing new threads and tearing completed threads down.
So... when you use Delegate.BeginInvoke you're adding the method to be invoked to a queue, as soon as the ThreadPool thinks it has an available thread for your task it will execute. However, if the ThreadPool is out of available threads then you'll be left waiting.
System.Threading.ThreadPool has several properties and methods to show how many threads are available, maximums, etc. I would try monitoring those counts to see if it looks like the ThreadPool is being spread thin.
If that's the case then the best resolution is to ensure that the ThreadPool is only being used for short-lived (small) tasks. If it's being used for long-running tasks then those tasks should be modified to use their own dedicated thread rather than occupying the ThreadPool.
Can you set the priority of the BeginInvoke?
http://msdn.microsoft.com/en-us/library/system.windows.threading.dispatcherpriority.aspx
Do you have other BeginInvoke calls waiting?
"If multiple BeginInvoke calls are made at the same DispatcherPriority, they will be executed in the order the calls were made."
http://msdn.microsoft.com/en-us/library/ms591206.aspx
I want to implement a timeout on the execution of tasks in a project that uses the CCR. Basically when I post an item to a Port or enqueue a Task to a DispatcherQueue I want to be able to abort the task or the thread that its running on if it takes longer than some configured time. How can I do this?
Can you confirm what you are asking? Are you running a long-lived task in the Dispatcher? Killing the thread would break the CCR model, so you need to be able to signal to the thread to finish its work and yield. Assuming it's a loop that is not finishing quick enough, you might choose to enqueue a timer:
var resultTimeoutPort = new Port<DateTime>();
dispatcherQueue.EnqueueTimer(TimeSpan.FromSeconds(RESULT_TIMEOUT),
resultTimeoutPort);
and ensure the blocking thread has available a reference to resultTimeoutPort. In the blocking loop, one of the exit conditions might be:
do
{
//foomungus amount of work
}while(resultTimeoutPort.Test()==null&&
someOtherCondition)
Please post more info if I'm barking up the wrong tree.
You could register the thread (Thread.CurrentThread) at the beginning of your CCR "Receive" handler (or in a method that calls your method via a delegate). Then you can do your periodic check and abort if necessary basically the same way you would have done it if you created the thread manually. The catch is that if you use your own Microsoft.Ccr.Core.Dispatcher with a fixed number of threads, I don't think there is a way to get those threads back once you abort them (based on my testing). So, if your dispatcher has 5 threads, you'll only be able to abort 5 times before posting will no longer work regardless of what tasks have been registered. However, if you construct a DispatcherQueue using the CLR thread pool, any CCR threads you abort will be replaced automatically and you won't have that problem. From what I've seen, although the CCR dispatcher is recommended, I think using the CLR thread pool is the way to go in this situation.