I have a sync method GetReports() which return value will be used to set the data source of a UI control. It may take a while to run. Is it idiomatic way to call it asynchronously as the following?
var l = new List<...>();
await Task.Run(() => l = GetReports().ToList());
UIControl.DataSource = l;
You should use Microsoft's Reactive Framework (aka Rx) - NuGet System.Reactive.Windows.Forms and add using System.Reactive.Linq; - then you can do this:
IDisposable subscription =
Observable
.Start(() => GetReports().ToList())
.ObserveOn(UIControl)
.Subscribe(list => UIControl.DataSource = list);
This nicely pushes to a new thread and then pulls it back before updating the DataSource.
If you need to cancel before it has finished just call subscription.Dispose();.
If your call to GetReports is cancellable then you can do this:
IDisposable subscription =
Observable
.FromAsync(ct => GetReports(ct))
.Select(x => x.ToList())
.ObserveOn(UIControl)
.Subscribe(list => UIControl.DataSource = list);
Calling subscription.Dispose() will now also cancel the task.
If you are after responsive UI and to run a long running CPU workload (and not Scalability as such) then this is fine and will achieve what you want. Essentially it will
Start a new thread (term used loosely)
Create a continuation
Give back the thread that calls it (in your case the UI thread)
Execute the workload
Run the continuation
5a Execute everything after the await on the thread you called it on
Although Tasks aren't threads, you will find this will steal a thread from the Thread Pool to do your workload, and it will free up the UI Thread until its finished
You could also do the same thing with the older style Task.Run and ContinueWith.
There is another school of thought as well, that if you use the TaskFactory.StartNew with the TaskCreationOptions as LongRunning it will hint to the Default TaskScheduler that you want to create a thread external to the Thread Pool. This gives the advantage of leaving the Thread Pool with more resources.
On saying that, TaskFactory.StartNew is the grand daddy of the Task creation methods, it has its own quirks, and you probably should only use it when you specifically feel the need to do so. I would just stick with what you have.
The last note, although it seems like a good idea to wrap this workload in a method and call it async, its generally not a good idea; and if you must wrap, its best left to caller to decide these things. So once again you are doing the right thing. Stephen Cleary talks about Fake Async and Async Wrappers and the reasons why you shouldn't have a need to do it in
Task.Run Etiquette and Proper Usage
Related
Actually I want to have one dedicated worker thread for handling events from both main and other worker threads. This thread must also be able to invoke delegates in other threads. (The thread receives commands from the main thread, executes some of them in other worker threads, processes the completion and progress events from these commands, and inform the main thread about how he is doing).
All this could be done manually by implementing an analogue of the message queue from delegates in the desired thread. However, I would prefer a higher level approach if such exists. From the documentation I got the impression that the Dispatcher class is well suited for this purpose. I also got feeling that an object of this class can be created in any thread, but didn't find any example. Is my feeling wrong?
There is nothing built-in, but it is quite easy to create a custom TaskScheduler that schedules work on a specific thread. Most implementations use the handy BlockingCollection class as a queue. Here is one implementation, and here is another one. You could customize any of these implementations to your needs, and then use it like this:
var myScheduler = new SingleThreadTaskScheduler();
var task = Task.Factory.StartNew(() => DoSomething(), default,
TaskCreationOptions.None, myScheduler);
There is also the option to use the fancy TaskFactory class, so that you can start tasks on this scheduler with less code:
var myFactory = new TaskFactory(new SingleThreadTaskScheduler());
var task = myFactory.StartNew(() => DoSomething());
Personally I am not a fan of this class, because it makes me confused with the similarity between TaskFactory and the Task.Factory property.
Consider we have 2 I/O bound tasks that need to be processed, for N amount of elements. We can call the 2 tasks A and B. B can only be run after A has produced a result.
We can accomplish this in two ways. (Please ignore cases of Access to modified closure.)
Task.Run way:
List<Task> workers = new List<Task>();
for (int i = 0; i < N; i++)
{
workers.Add(Task.Run(async () =>
{
await A(i);
await B(i);
}
}
await Task.WhenAll(workers);
Classic Fork/Join:
List<Task> workersA = new List<Task>();
List<Task> workersB = new List<Task>();
for (int i = 0; i < N; i++)
{
workersA.Add(A(i));
}
await Task.WhenAll(workersA);
for (int i = 0; i < N; i++)
{
workersB.Add(B(i));
}
await Task.WhenAll(workersB);
Alternatively this can be done also in the following way:
List<Task> workers = new List<Task>();
for (int i = 0; i < N; i++)
{
workers.Add(A(i));
}
for (int i = 0; i < N; i++)
{
await workers[i];
workers[i] = B(i);
}
await Task.WhenAll(workers);
My concerns are that the following MSDN docs state that we should never use Task.Run for I/O operations.
Taking that into consideration, what's the best approach to handle this case then?
Correct me if I'm wrong, but we want to avoid using Task.Run, because we effectively queue Threads to handle the work, where if we just use await, there won't be any thread. (Due to the operations being I/O.)
I really wish to go down the Task.Run route, but if it ends up using threads for no apparent reason/does additional overhead, then It's a no-go.
I really wish to go down the Task.Run route
Why?
but if it ends up using threads for no apparent reason, then It's a no-go.
The documentation says it:
Queues the specified work to run on the ThreadPool
That doesn't necessarily mean a brand new thread for every time you call Task.Run. It might, but not necessarily. All you can guarantee is that it will run on a thread that is not the current one.
You have no control over how many threads get created to do all that work. But the recommendation to not use Task.Run for I/O operations is sound. It's needless overhead for no gain. It will be less efficient.
Either of your other solutions would work fine. Your last solution might finish quicker since you are starting the calls to B() sooner (you only wait for the first A() to finish before starting to call B() instead of waiting for them all to complete).
Update based on Theodor's answer: We're both right :) It's important to know that all the code in an async method before the first await (and the code after, unless you specify otherwise) will run in the same context it was started from. In a desktop app, that's the UI thread. The waiting is asynchronous. So the UI thread is freed while waiting. But if there is any CPU-heavy work in that method, it will lock the UI thread.
So Theodor is saying that you can use Task.Run to get it off the UI thread ASAP and guarantee it will never lock the UI thread. While that's true, you cannot blindly use that advice everywhere. For one, you may need to do something in the UI after the I/O operation, and that must be done on the UI thread. If you've run it with Task.Run, then you have to make sure to marshall back to the UI thread for that work.
But if the async method you call has enough CPU-bound work that it freezes the UI, then it's not strictly an I/O operation and the advice of "Use Task.Run for CPU-bound work, and async/await for I/O" still fits.
All I can say is: Try it. If you find that whatever you're doing freezes the UI, then use Task.Run. If you find that it doesn't, then Task.Run is needless overhead (not much, mind you, but still needless, but gets worse if you're doing it in a loop like you are).
And all that really applies to desktop apps. If you're in ASP.NET then Task.Run won't do anything for you unless you're trying to do something in parallel. In ASP.NET, there is no "UI thread", so it doesn't matter which thread you do the work on. You just want to make sure you don't lock the thread while waiting (since there are a limited number of threads in ASP.NET).
If the work you have is I/O-bound, use async and await without Task.Run. You should not use the Task Parallel Library. The reason for this is outlined in the Async in Depth article.
This piece of advice, although it comes from the site of Microsoft, is misleading. By discouraging Task.Run for I/O operations, the author had probably this in mind:
var data = await Task.Run(() =>
{
return webClient.DownloadString(url); // Blocking call
});
...which is indeed bad because it blocks a thread-pool thread. But using Task.Run with an async delegate is perfectly fine:
var data = await Task.Run(async () =>
{
return await webClient.DownloadStringTaskAsync(url); // Async call
});
Actually in my opinion this is the preferred way of initiating asynchronous operations from the event handlers of a UI application, because it ensures that the UI thread will be freed immediately. If instead you follow the article's advice and omit the Task.Run:
private async void Button1_Click(object sender, EventArgs args)
{
var data = await webClient.DownloadStringTaskAsync(url);
}
...then you risk that the async method may not be 100% async, and may block the UI thread. This is a tiny concern for built-in async methods like the DownloadStringTaskAsync that is written by experts, but becomes a greater concern for 3rd party async methods, and an even greater concern for async methods written by the developers themselves!
So regarding the options of your question, I believe that the first one (Task.Run way) is the safest and the most efficient. The second one will await separately all A and all B tasks, so the duration will be at best Max(A) + Max(B). Which statistically should be longer than Max(A + B).
If we fill a list of Tasks that need to do both CPU-bound and I/O bound work, by simply passing their method declaration to that list (Not by creating a new task and manually scheduling it by using Task.Start), how exactly are these tasks handled?
I know that they are not done in parallel, but concurrently.
Does that mean that a single thread will move along them, and that single thread might not be the same thread in the thread pool, or the same thread that initially started waiting for them all to complete/added them to the list?
EDIT: My question is about how exactly these items are handled in the list concurrently - is the calling thread moving through them, or something else is going on?
Code for those that need code:
public async Task SomeFancyMethod(int i)
{
doCPUBoundWork(i);
await doIOBoundWork(i);
}
//Main thread
List<Task> someFancyTaskList = new List<Task>();
for (int i = 0; i< 10; i++)
someFancyTaskList.Add(SomeFancyMethod(i));
// Do various other things here --
// how are the items handled in the meantime?
await Task.WhenAll(someFancyTaskList);
Thank you.
Asynchronous methods always start running synchronously. The magic happens at the first await. When the await keyword sees an incomplete Task, it returns its own incomplete Task. If it sees a complete Task, execution continues synchronously.
So at this line:
someFancyTaskList.Add(SomeFancyMethod(i));
You're calling SomeFancyMethod(i), which will:
Run doCPUBoundWork(i) synchronously.
Run doIOBoundWork(i).
If doIOBoundWork(i) returns an incomplete Task, then the await in SomeFancyMethod will return its own incomplete Task.
Only then will the returned Task be added to your list and your loop will continue. So the CPU-bound work is happening sequentially (one after the other).
There is some more reading about this here: Control flow in async programs (C#)
As each I/O operation completes, the continuations of those tasks are scheduled. How those are done depends on the type of application - particularly, if there is a context that it needs to return to (desktop and ASP.NET do unless you specify ConfigureAwait(false), ASP.NET Core doesn't). So they might run sequentially on the same thread, or in parallel on ThreadPool threads.
If you want to immediately move the CPU-bound work to another thread to run that in parallel, you can use Task.Run:
someFancyTaskList.Add(Task.Run(() => SomeFancyMethod(i)));
If this is in a desktop application, then this would be wise, since you want to keep CPU-heavy work off of the UI thread. However, then you've lost your context in SomeFancyMethod, which may or may not matter to you. In a desktop app, you can always marshall calls back to the UI thread fairly easily.
I assume you don't mean passing their method declaration, but just invoking the method, like so:
var tasks = new Task[] { MethodAsync("foo"),
MethodAsync("bar") };
And we'll compare that to using Task.Run:
var tasks = new Task[] { Task.Run(() => MethodAsync("foo")),
Task.Run(() => MethodAsync("bar")) };
First, let's get the quick answer out of the way. The first variant will have lower or equal parallelism to the second variant. Parts of MethodAsync will run the caller thread in the first case, but not in the second case. How much this actually affects the parallelism depends entirely on the implementation of MethodAsync.
To get a bit deeper, we need to understand how async methods work. We have a method like:
async Task MethodAsync(string argument)
{
DoSomePreparationWork();
await WaitForIO();
await DoSomeOtherWork();
}
What happens when you call such a method? There is no magic. The method is a method like any other, just rewritten as a state machine (similar to how yield return works). It will run as any other method until it encounters the first await. At that point, it may or may not return a Task object. You may or may not await that Task object in the caller code. Ideally, your code should not depend on the difference. Just like yield return, await on a (non-completed!) task returns control to the caller of the method. Essentially, the contract is:
If you have CPU work to do, use my thread.
If whatever you do would mean the thread isn't going to use the CPU, return a promise of the result (a Task object) to the caller.
It allows you to maximize the ratio of what CPU work each thread is doing. If the asynchronous operation doesn't need the CPU, it will let the caller do something else. It doesn't inherently allow for parallelism, but it gives you the tools to do any kind of asynchronous operation, including parallel operations. One of the operations you can do is Task.Run, which is just another asynchronous method that returns a task, but which returns to the caller immediately.
So, the difference between:
MethodAsync("foo");
MethodAsync("bar");
and
Task.Run(() => MethodAsync("foo"));
Task.Run(() => MethodAsync("bar"));
is that the former will return (and continue to execute the next MethodAsync) after it reaches the first await on a non-completed task, while the latter will always return immediately.
You should usually decide based on your actual requirements:
Do you need to use the CPU efficiently and minimize context switching etc., or do you expect the async method to have negligible CPU work to do? Invoke the method directly.
Do you want to encourage parallelism or do you expect the async method to do interesting amounts of CPU work? Use Task.Run.
Here is your code rewritten without async/await, with old-school continuations instead. Hopefully it will make it easier to understand what's going on.
public Task CompoundMethodAsync(int i)
{
doCPUBoundWork(i);
return doIOBoundWorkAsync(i).ContinueWith(_ =>
{
doMoreCPUBoundWork(i);
});
}
// Main thread
var tasks = new List<Task>();
for (int i = 0; i < 10; i++)
{
Task task = CompoundMethodAsync(i);
tasks.Add(task);
}
// The doCPUBoundWork has already ran synchronously 10 times at this point
// Do various things while the compound tasks are progressing concurrently
Task.WhenAll(tasks).ContinueWith(_ =>
{
// The doIOBoundWorkAsync/doMoreCPUBoundWork have completed 10 times at this point
// Do various things after all compound tasks have been completed
});
// No code should exist here. Move everything inside the continuation above.
I have this function that bruteforces GUIDs to calculate IDs (GUIDs are hash calculations of the ID, so reversing the procedure can't be done by nothing else but bruteforcing through 2^32-1 (yes, that's right) possible IDs.
I used await Task.Delay(1); to "refresh" the UI thread so everything looks natural, however this slows down the process too much, but if I were to use await Task.Delay(0); the whole app would freeze and i wouldn't know the current progress of the search (shown as the 2 IProgresss for the progress bar.
async public Task<Dictionary<string, string>> GetSteamIds(Dictionary<string,string> GUIDs, IProgress<UInt32> progress, IProgress<double> progress2)
{
List<string> Progress = new List<string>();
foreach(string GUID in GUIDs.Keys)
{
Progress.Add(GUID);
}
for (;Iteration <= UInt32.MaxValue;Iteration++)
{
await Task.Delay(0);
string guid = CalculateGUID(MinAcc+Iteration);
if(GUIDs.ContainsKey(guid))
{
GUIDs[guid] = Convert.ToString((Int64)(MinAcc + Iteration));
}
progress.Report(Iteration);
progress2.Report((1.0 * Iteration / UInt32.MaxValue * 100));
if(Progress.Count==0)
{
if (progress2 != null)
{
progress2.Report(100);
}
break;
}
}
return GUIDs;
}
I do not know how to mitigate this and even though my whole method is async, and the call is async, it still sits on the "UI thread" and I don't know how to work around this.
Naturally, I would appreciate a good answer that works, but I would also appreciate some elaboration to how this all works. I've read quite a bit of articles but none really explained it to me in a way for me to understand everything.
Adding the async keyword to a method doesn't actually make the method asynchronous, it just means that you're allowed to use the await keyword inside of the method. You've written an entirely synchronous CPU bound method and just tagged the method as async, making it still just a synchronous CPU bound method, and calling a synchronous CPU bound method from the UI thread is going to block the UI thread, preventing other UI actions from taking place.
Adding in an await Task.Delay(1) is making your method still do all of its long running CPU bound work in the UI thread, it's just stopping every once in a while to move to the end of the line, allowing other operations to run. Not only is this slowing down your work, it also means that the UI thread isn't able to perform other UI operations the vast majority of the time as this process is using up so much of its time.
The solution is to simply not do the long running CPU bound work in the UI thread at all, and to offload that work to another thread.
Since this method of your isn't actually asynchronous, you should remove the async keyword from the method (and change the return type accordingly). When you call this method, if you happen to need to do so from the UI thread, simply wrap it in a call to Task.Run which will offload it to a thread pool thread. If that UI operation need to do work after it has completed with the result, then that UI operation can await the result of Task.Run, and that method should be marked as async, because it really is doing its work asynchronously.
Your GetSteamIds method is entirely synchronous and CPU-bound, so it should have a synchronous signature:
public Dictionary<string, string> GetSteamIds(Dictionary<string,string> GUIDs, IProgress<UInt32> progress, IProgress<double> progress2);
Now, when you want to invoke this (synchronous, CPU-bound) method from the UI thread, use await Task.Run to run it on a threadpool thread and (asynchronously) get the result of the method:
var results = await Task.Run(() => GetSteamIds(guids, progress, progress2));
If you're using Progress<T>, then it will take care of handling progress updates on the UI thread for you; there's no need for the awkward Dispatcher or Control.Invoke.
On a side note, with a very tight CPU-bound loop like this, you may find your progress reports themselves will slow down your UI and degrade your user experience. In that case, consider using my ObservableProgress, an IProgress<T> implementation that throttles progress updates.
Don't run the task on the UI thread.
You could call it like this:
var myTask = Task.Run(() => GetSteamIds(...));
And keep a reference to myTask somewhere so you can test myTask.IsCompleted to see when it's done.
Or, you can use ContinueWith to automatically run some other code when the Task completes. For example:
var myTask = Task.Run(() => GetSteamIds(...)).ContinueWith(() => {
//do something here now that the task is done
}, TaskScheduler.FromCurrentSynchronizationContext());
The TaskScheduler.FromCurrentSynchronizationContext() is to make sure that the continuation code runs on the UI thread (since you will likely want to update the UI with the results).
With this usage, there is no need for your method to be asynchronous at all (it does not need to return Task<>).
Or use BackgroundWorker, as suggested in the comments.
when you call await Task.Delay(1) it will actually give back access to the UI thread because probably the problem is not in the method you are calling, it is in the call of the method.
So you should remove the Task.Delay() in the method and show us the call of the method so we can help you further
I've looked all over and I can't find an answer.
Is it better, worse, or indifferent to use:
{
...
RefreshPaintDelegate PaintDelegate = new RefreshPaintDelegate(RefreshPaint);
Control.Invoke(PaintDelegate);
}
protected void RefreshPaint()
{
this.Refresh();
}
...or...
Task.Factory.StartNew(() =>
{
this.Refresh();
},
CancellationToken.None,
TaskCreationOptions.None,
uiScheduler);
Assuming that uiScheduler is a scheduler that will delegate the calls to the UI thread, I would say that functionally, using the two is indifferent (with the exception that the call to Control.Invoke will block until the call completes, whereas the call to Task will not, however, you can always use Control.BeginInvoke to make them semantically equivalent).
From a semantic point of view, I'd say that using Control.Invoke(PaintDelegate) is a much better approach; when using a Task you are making an implicit declaration that you want to perform a unit of work, and typically, that unit of work has the context of being scheduled along with other units of work, it's the scheduler that determines how that work is delegated (typically, it's multi-threaded, but in this case, it's marshaled to the UI thread). It should also be said that there is no clear link between the uiScheduler and the Control which is linked to the UI thread that the call should be made one (typically, they are all the same, but it's possible to have multiple UI threads, although very rare).
However, in using Control.Invoke, the intention of what you want to do is clear, you want to marshal the call to the UI thread that the Control is pumping messages on, and this call indicates that perfectly.
I think the best option, however, is to use a SynchronizationContext instance; it abstracts out the fact that you need to synchronize calls to that context, as opposed to the other two options, which are either ambiguous about the intent in the call (Task) or very specific in the way it is being done (Control.Invoke).
It is not same. First version will block the calling thread until UI thread is ready to invoke the method. For a non blocking version, you should use Control.BeginInvoke, which also returns immediately.
Apart from that (if you are comparing Task to a Thread Pool thread), there is little difference in using them.
[Edit]
In this case, there is no difference between Task.Factory.StartNew and Control.BeginInvoke (but not Invoke as I wrote above), since there is only a single GUI thread which can execute your code. No matter how many calls you make using either of them, they will still execute sequentially when UI thread becomes free.