In C# 4.0, we have Task in the System.Threading.Tasks namespace. What is the true difference between Thread and Task. I did some sample program(help taken from MSDN) for my own sake of learning with
Parallel.Invoke
Parallel.For
Parallel.ForEach
but have many doubts as the idea is not so clear.
I have initially searched in Stackoverflow for a similar type of question but may be with this question title I was not able to get the same. If anyone knows about the same type of question being posted here earlier, kindly give the reference of the link.
In computer science terms, a Task is a future or a promise. (Some people use those two terms synonymously, some use them differently, nobody can agree on a precise definition.) Basically, a Task<T> "promises" to return you a T, but not right now honey, I'm kinda busy, why don't you come back later?
A Thread is a way of fulfilling that promise. But not every Task needs a brand-new Thread. (In fact, creating a thread is often undesirable, because doing so is much more expensive than re-using an existing thread from the thread pool. More on that in a moment.) If the value you are waiting for comes from the filesystem or a database or the network, then there is no need for a thread to sit around and wait for the data when it can be servicing other requests. Instead, the Task might register a callback to receive the value(s) when they're ready.
In particular, the Task does not say why it is that it takes such a long time to return the value. It might be that it takes a long time to compute, or it might be that it takes a long time to fetch. Only in the former case would you use a Thread to run a Task. (In .NET, threads are freaking expensive, so you generally want to avoid them as much as possible and really only use them if you want to run multiple heavy computations on multiple CPUs. For example, in Windows, a thread weighs 12 KiByte (I think), in Linux, a thread weighs as little as 4 KiByte, in Erlang/BEAM even just 400 Byte. In .NET, it's 1 MiByte!)
A task is something you want done.
A thread is one of the many possible workers which performs that task.
In .NET 4.0 terms, a Task represents an asynchronous operation. Thread(s) are used to complete that operation by breaking the work up into chunks and assigning to separate threads.
Thread
The bare metal thing, you probably don't need to use it, you probably can use a LongRunning task and take the benefits from the TPL - Task Parallel Library, included in .NET Framework 4 (february, 2002) and above (also .NET Core).
Tasks
Abstraction above the Threads. It uses the thread pool (unless you specify the task as a LongRunning operation, if so, a new thread is created under the hood for you).
Thread Pool
As the name suggests: a pool of threads. This is the .NET framework handling a limited number of threads for you. Why? Because opening 100 threads to execute expensive CPU operations on a Processor with just 8 cores definitely is not a good idea. The framework will maintain this pool for you, reusing the threads (not creating/killing them at each operation), and executing some of them in parallel, in a way that your CPU will not burn.
OK, but when to use each one?
In resume: always use tasks.
Task is an abstraction, so it is a lot easier to use. I advise you to always try to use tasks and if you face some problem that makes you need to handle a thread by yourself (probably 1% of the time) then use threads.
BUT be aware that:
I/O Bound: For I/O bound operations (database calls, read/write files, APIs calls, etc) avoid using normal tasks, use LongRunning tasks (or threads if you need to). Because using tasks would lead you to a thread pool with a few threads busy and a lot of other tasks waiting for its turn to take the pool.
CPU Bound: For CPU bound operations just use the normal tasks (that internally will use the thread pool) and be happy.
In addition to above points, it would be good to know that:
A task is by default a background task. You cannot have a foreground task. On the other hand a thread can be background or foreground (Use IsBackground property to change the behavior).
Tasks created in thread pool recycle the threads which helps save resources. So in most cases tasks should be your default choice.
If the operations are quick, it is much better to use a task instead of thread. For long running operations, tasks do not provide much advantages over threads.
A Task can be seen as a convenient and easy way to execute something asynchronously and in parallel.
Normally a Task is all you need, I cannot remember if I have ever used a thread for anything other than experimentation.
You can accomplish the same thing, with a thread (with lots of effort) as you can with a task.
Thread
int result = 0;
Thread thread = new System.Threading.Thread(() => {
result = 1;
});
thread.Start();
thread.Join();
Console.WriteLine(result); //is 1
Task
int result = await Task.Run(() => {
return 1;
});
Console.WriteLine(result); //is 1
A task will by default use the Threadpool, which saves resources as creating threads can be expensive. You can see a Task as a higher level abstraction upon threads.
As this article points out, Task provides the following powerful features over Thread.
Tasks are tuned for leveraging multicore processors.
If the system has multiple Tasks then it makes use of the CLR thread pool
internally, and so does not have the overhead associated with creating
a dedicated thread using the Thread. Also reduces the context
switching time among multiple threads.
Task can return a result. There is no direct mechanism to return the result from thread.
Wait on a set of Tasks, without a signaling construct.
We can chain Tasks together to execute one after the other.
Establish a parent/child relationship when one task is started from
another task.
A child Task Exception can propagate to parent task.
Tasks support cancellation through the use of cancellation tokens.
Asynchronous implementation is easy in Task, using async and
await keywords.
I usually use Task to interact with Winforms and simple background worker to make it not freeze the UI. Here is an example of when I prefer using Task.
private async void buttonDownload_Click(object sender, EventArgs e)
{
buttonDownload.Enabled = false;
await Task.Run(() => {
using (var client = new WebClient())
{
client.DownloadFile("http://example.com/file.mpeg", "file.mpeg");
}
})
buttonDownload.Enabled = true;
}
VS
private void buttonDownload_Click(object sender, EventArgs e)
{
buttonDownload.Enabled = false;
Thread t = new Thread(() =>
{
using (var client = new WebClient())
{
client.DownloadFile("http://example.com/file.mpeg", "file.mpeg");
}
this.Invoke((MethodInvoker)delegate()
{
buttonDownload.Enabled = true;
});
});
t.IsBackground = true;
t.Start();
}
the difference is you don't need to use MethodInvoker and shorter code.
You can use Task to specify what you want to do then attach that Task with a Thread. so that Task would be executed in that newly made Thread rather than on the GUI thread.
Use Task with the TaskFactory.StartNew(Action action). In here you execute a delegate so if you didn't use any thread it would be executed in the same thread (GUI thread). If you mention a thread you can execute this Task in a different thread. This is an unnecessary work cause you can directly execute the delegate or attach that delegate to a thread and execute that delegate in that thread. So don't use it. it's just unnecessary. If you intend to optimize your software this is a good candidate to be removed.
**Please note that the Action is a delegate.
Task is like an operation that you want to perform. Thread helps to manage those operation through multiple process nodes. Task is a lightweight option as Threading can lead to complex code management.
I suggest you read from MSDN (best in world) always Task
Thread
Related
In C# 4.0, we have Task in the System.Threading.Tasks namespace. What is the true difference between Thread and Task. I did some sample program(help taken from MSDN) for my own sake of learning with
Parallel.Invoke
Parallel.For
Parallel.ForEach
but have many doubts as the idea is not so clear.
I have initially searched in Stackoverflow for a similar type of question but may be with this question title I was not able to get the same. If anyone knows about the same type of question being posted here earlier, kindly give the reference of the link.
In computer science terms, a Task is a future or a promise. (Some people use those two terms synonymously, some use them differently, nobody can agree on a precise definition.) Basically, a Task<T> "promises" to return you a T, but not right now honey, I'm kinda busy, why don't you come back later?
A Thread is a way of fulfilling that promise. But not every Task needs a brand-new Thread. (In fact, creating a thread is often undesirable, because doing so is much more expensive than re-using an existing thread from the thread pool. More on that in a moment.) If the value you are waiting for comes from the filesystem or a database or the network, then there is no need for a thread to sit around and wait for the data when it can be servicing other requests. Instead, the Task might register a callback to receive the value(s) when they're ready.
In particular, the Task does not say why it is that it takes such a long time to return the value. It might be that it takes a long time to compute, or it might be that it takes a long time to fetch. Only in the former case would you use a Thread to run a Task. (In .NET, threads are freaking expensive, so you generally want to avoid them as much as possible and really only use them if you want to run multiple heavy computations on multiple CPUs. For example, in Windows, a thread weighs 12 KiByte (I think), in Linux, a thread weighs as little as 4 KiByte, in Erlang/BEAM even just 400 Byte. In .NET, it's 1 MiByte!)
A task is something you want done.
A thread is one of the many possible workers which performs that task.
In .NET 4.0 terms, a Task represents an asynchronous operation. Thread(s) are used to complete that operation by breaking the work up into chunks and assigning to separate threads.
Thread
The bare metal thing, you probably don't need to use it, you probably can use a LongRunning task and take the benefits from the TPL - Task Parallel Library, included in .NET Framework 4 (february, 2002) and above (also .NET Core).
Tasks
Abstraction above the Threads. It uses the thread pool (unless you specify the task as a LongRunning operation, if so, a new thread is created under the hood for you).
Thread Pool
As the name suggests: a pool of threads. This is the .NET framework handling a limited number of threads for you. Why? Because opening 100 threads to execute expensive CPU operations on a Processor with just 8 cores definitely is not a good idea. The framework will maintain this pool for you, reusing the threads (not creating/killing them at each operation), and executing some of them in parallel, in a way that your CPU will not burn.
OK, but when to use each one?
In resume: always use tasks.
Task is an abstraction, so it is a lot easier to use. I advise you to always try to use tasks and if you face some problem that makes you need to handle a thread by yourself (probably 1% of the time) then use threads.
BUT be aware that:
I/O Bound: For I/O bound operations (database calls, read/write files, APIs calls, etc) avoid using normal tasks, use LongRunning tasks (or threads if you need to). Because using tasks would lead you to a thread pool with a few threads busy and a lot of other tasks waiting for its turn to take the pool.
CPU Bound: For CPU bound operations just use the normal tasks (that internally will use the thread pool) and be happy.
In addition to above points, it would be good to know that:
A task is by default a background task. You cannot have a foreground task. On the other hand a thread can be background or foreground (Use IsBackground property to change the behavior).
Tasks created in thread pool recycle the threads which helps save resources. So in most cases tasks should be your default choice.
If the operations are quick, it is much better to use a task instead of thread. For long running operations, tasks do not provide much advantages over threads.
A Task can be seen as a convenient and easy way to execute something asynchronously and in parallel.
Normally a Task is all you need, I cannot remember if I have ever used a thread for anything other than experimentation.
You can accomplish the same thing, with a thread (with lots of effort) as you can with a task.
Thread
int result = 0;
Thread thread = new System.Threading.Thread(() => {
result = 1;
});
thread.Start();
thread.Join();
Console.WriteLine(result); //is 1
Task
int result = await Task.Run(() => {
return 1;
});
Console.WriteLine(result); //is 1
A task will by default use the Threadpool, which saves resources as creating threads can be expensive. You can see a Task as a higher level abstraction upon threads.
As this article points out, Task provides the following powerful features over Thread.
Tasks are tuned for leveraging multicore processors.
If the system has multiple Tasks then it makes use of the CLR thread pool
internally, and so does not have the overhead associated with creating
a dedicated thread using the Thread. Also reduces the context
switching time among multiple threads.
Task can return a result. There is no direct mechanism to return the result from thread.
Wait on a set of Tasks, without a signaling construct.
We can chain Tasks together to execute one after the other.
Establish a parent/child relationship when one task is started from
another task.
A child Task Exception can propagate to parent task.
Tasks support cancellation through the use of cancellation tokens.
Asynchronous implementation is easy in Task, using async and
await keywords.
I usually use Task to interact with Winforms and simple background worker to make it not freeze the UI. Here is an example of when I prefer using Task.
private async void buttonDownload_Click(object sender, EventArgs e)
{
buttonDownload.Enabled = false;
await Task.Run(() => {
using (var client = new WebClient())
{
client.DownloadFile("http://example.com/file.mpeg", "file.mpeg");
}
})
buttonDownload.Enabled = true;
}
VS
private void buttonDownload_Click(object sender, EventArgs e)
{
buttonDownload.Enabled = false;
Thread t = new Thread(() =>
{
using (var client = new WebClient())
{
client.DownloadFile("http://example.com/file.mpeg", "file.mpeg");
}
this.Invoke((MethodInvoker)delegate()
{
buttonDownload.Enabled = true;
});
});
t.IsBackground = true;
t.Start();
}
the difference is you don't need to use MethodInvoker and shorter code.
You can use Task to specify what you want to do then attach that Task with a Thread. so that Task would be executed in that newly made Thread rather than on the GUI thread.
Use Task with the TaskFactory.StartNew(Action action). In here you execute a delegate so if you didn't use any thread it would be executed in the same thread (GUI thread). If you mention a thread you can execute this Task in a different thread. This is an unnecessary work cause you can directly execute the delegate or attach that delegate to a thread and execute that delegate in that thread. So don't use it. it's just unnecessary. If you intend to optimize your software this is a good candidate to be removed.
**Please note that the Action is a delegate.
Task is like an operation that you want to perform. Thread helps to manage those operation through multiple process nodes. Task is a lightweight option as Threading can lead to complex code management.
I suggest you read from MSDN (best in world) always Task
Thread
So I am developing a UWP application that has a large number of threads. Previously I would start all the threads with System.Threading.Tasks.Task.Run(), save their thread handles to an array, then Task.WaitAll() for completion and get the results. This currently is taking too much memory though.
I have changed my code to only wait for a smaller amount of threads and copy their results out before continuing on to more of the threads. Since UWP the UWP implementation of Task does not implement IDisposable, what is the proper way to signal the framework that I am done with a task so it can be garbage collected? I would like to read out the results of the treads after a certain number of them come in and dispose of the threads resources to make space for the next threads.
Thanks so much!
Just to point out an issue which might be degrading the performance of your application: You are deliberately blocking the thread until all Tasks complete rather than actually await for them. That would make sense, if you are not performing Asynchronous work inside them, but if you are, you should definitely switch to:
Task.WhenAll rather than Task.WaitAll , such as this:
List<Tasks> tasks = new List<Tasks> { Method1(), Method2(), ... };
Task result = await Task.WhenAll(tasks);
This way, you are actually leveraging the asynchrony of your app, and you will not block the current thread until all the tasks are completed, like Task.WaitAll() does.
Since you are utilizing the Task.Run() method, instead of the Task.Factory.StartNew(), the TaskScheduler used is the default, and utilizes Threads from the Thread Pool. So you will not actually end up blocking the UI thread, but blocking many Thread Pool threads, is also not good.
Taking from Microsoft documentation, for one of the cases where Thread Pools should not be used:
You have tasks that cause the thread to block for long periods of
time. The thread pool has a maximum number of threads, so a large
number of blocked thread pool threads might prevent tasks from
starting.
Edit:
I do not need anything else but I will look in to that! Thanks! So is
there any way I can get it to run the Tasks like a FIFO with just the
API's available with the default thread pool?
You should take a look, into Continuations
A continuation is nothing else other than a task which is activated whenever it's antecedent task/tasks have completed. If you have specific tasks which you only want to execute after another task has completed you should take a look into Continuations, since they are extremely flexible, and you can actually create really complex flow of Tasks to better suit your needs.
Garbage collection on a .Net application always works the same, when a variable is not needed anymore (out of scope) it is collected.
Why do you think the threads are consuming the memory? It is much likely than the process inside the threads is the one consuming the memory.
I'm new to all the parallel programming paradigms in .NET and I'd like to know two things:
How can I wait for all of my tasks to finish running without it freezing up my main form.
Is my method of solving the following problem the best way to go about doing it?
I have a private async Task RunTask(string task) method that I need to run as many times as the number of tasks I have.
Inside a private async void button_click(object sender, EventArgs e) class, I have a List of strings called tasklist which initializes the tasks I need to run. I'm running the following code:
Thread thread = new Thread(()=> Parallel.ForEach(tasklist, t => RunTask(t)));
thread.Start();
RunOnCompletionOfTasks();
I need to be able to utilize all my CPU cores and run all tasks in the shortest amount of time possible. I figured Parallel.ForEach was the best way to achieve this but my tasks are also async because they have functions that require waiting on other methods. My tasks also append one string to a List<string> object within the class during their execution.
Parallel.ForEach causes my Windows Form to freeze up so I encapsulated it within a thread. The problem with this is that RunOnComlpetionOfTasks(); runs before my tasks have completed.
What's the most efficient way for me to run all my RunTask tasks and then run RunOnCompletionOfTasks upon completion without freezing up the Windows Form?
Please and thank you.
If you need to execute multiple Tasks (which are awaitable) then Parallel.ForEach may not be the best choice. It is truly designed for CPU bound processes and does not support async operations.
You may try to rewrite your code using Task.WhenAll():
var tasks = tasklist.Select(RunTask);
// assuming RunTask is declared as `async Task`
await Task.WhenAll(tasks);
This way your are leveraging the use of async/await pattern inside your method call.
If, instead, you realize that your tasks are not properly async (and are only CPU bound) you may try to execute Parallel.ForEach inside a simple Task
await Task.Run(() => Parallel.ForEach(tasklist, RunTask);
// assuming RunTask is not `async Task`
First off all you need to distinguish parallel and asynchronous:
When you need to do something with IO asynchronous is usually play.
When you need to do something with computing (like archiving files
or image processing) is parallel stuff is usually coming play.
async await pattern its just a shortcut and its not silver bullet.
If You need just to parallel your computation I suggest to use parallel staff from TPL like ParallelForEach, they a playing cool and optimized very good to get maximum performance.
If you are trying to perform some IO operations this can make sense but on more or less huge processing data amount - like server app.
Your processor can't do more then it can
Any parallel staff sill not help you if your task is not parallel.
Be aware of any context switch has some cost.
TaskSheduler will use queue to decrease threads count but its doing some work again.
Thread new Thread is evil (here at least). Your processor will be switched to one more thread more frequently (more threads - more switches)
Finally:
Try not yo use TPL things when you are not aware how does things work. Just use classes you need and dont do optimization that dont really optimize.
PS
Use ContinueWith in UI. Anything in TPT will return task that has this method,
Updating UI be sure of thread you are currently in.
I'm trying to make a database call async for an ASP.NET application. If I understand things correctly, I do not want to utilize thread pool threads for async I/O calls so I can keep the thread pool processing requests. Will the code below chew up a thread from my thread pool or generate a background thread?
public IEnumerable<dynamic> DbCall(string sql)
{
return // DB Operation;
}
public Task<IEnumerable<dynamic>> DbCallAsync(string sql)
{
var task = new Task<IEnumerable<dynamic>>(() => this.DbCall(sql));
task.Start();
return task;
}
Yes, using the Task constructor executes the code in another thread, in this case a thread pool thread.
You should be using a DB operation that is inherently asynchronous, not synchronous. You should not be using the Task constructor at all to construct a Task that represents an asynchronous operation. How you go about doing this will depend on what API you're using to perform your IO.
Tasks on the default scheduler run on the thread-pool. They do not, by default, start new threads. This answers your question.
That said, you misunderstand the purpose and inner workings of async IO. Async IO, while running, does not consume any thread at all. You are not using async IO however. You are moving IO to the thread-pool. This never helps in ASP.NET. It always reduced performance.
How could it possibly help to move blocking work to a different thread?! You are still blocking a thread. Just a different one. If your thread-pool is exhausted just increase the limits. No need to start threads manually.
Research why async is beneficial and when. Without this understanding you are not going to be successful using it.
What is difference between the below
ThreadPool.QueueUserWorkItem
vs
Task.Factory.StartNew
If the above code is called 500 times for some long running task, does it mean all the thread pool threads will be taken up?
Or will TPL (2nd option) be smart enough to just take up threads less or equal to number of processors?
If you're going to start a long-running task with TPL, you should specify TaskCreationOptions.LongRunning, which will mean it doesn't schedule it on the thread-pool. (EDIT: As noted in comments, this is a scheduler-specific decision, and isn't a hard and fast guarantee, but I'd hope that any sensible production scheduler would avoid scheduling long-running tasks on a thread pool.)
You definitely shouldn't schedule a large number of long-running tasks on the thread pool yourself. I believe that these days the default size of the thread pool is pretty large (because it's often abused in this way) but fundamentally it shouldn't be used like this.
The point of the thread pool is to avoid short tasks taking a large hit from creating a new thread, compared with the time they're actually running. If the task will be running for a long time, the impact of creating a new thread will be relatively small anyway - and you don't want to end up potentially running out of thread pool threads. (It's less likely now, but I did experience it on earlier versions of .NET.)
Personally if I had the option, I'd definitely use TPL on the grounds that the Task API is pretty nice - but do remember to tell TPL that you expect the task to run for a long time.
EDIT: As noted in comments, see also the PFX team's blog post on choosing between the TPL and the thread pool:
In conclusion, I’ll reiterate what the CLR team’s ThreadPool developer has already stated:
Task is now the preferred way to queue work to the thread pool.
EDIT: Also from comments, don't forget that TPL allows you to use custom schedulers, if you really want to...
No, there is no extra cleverness added in the way the ThreadPool threads are utilized, by using the Task.Factory.StartNew method (or the more modern Task.Run method). Calling Task.Factory.StartNew 500 times (with long running tasks) is certainly going to saturate the ThreadPool, and will keep it saturated for a long time. Which is not a good situation to have, because a saturated ThreadPool affects negatively any other independent callbacks, timer events, async continuations etc that may also be active during this 500-launched-tasks period.
The Task.Factory.StartNew method schedules the execution of the supplied Action on the TaskScheduler.Current, which by default is the TaskScheduler.Default, which is the internal ThreadPoolTaskScheduler class. Here is the implementation of the ThreadPoolTaskScheduler.QueueTask method:
protected internal override void QueueTask(Task task)
{
if ((task.Options & TaskCreationOptions.LongRunning) != 0)
{
// Run LongRunning tasks on their own dedicated thread.
Thread thread = new Thread(s_longRunningThreadWork);
thread.IsBackground = true; // Keep this thread from blocking process shutdown
thread.Start(task);
}
else
{
// Normal handling for non-LongRunning tasks.
bool forceToGlobalQueue = ((task.Options & TaskCreationOptions.PreferFairness) != 0);
ThreadPool.UnsafeQueueCustomWorkItem(task, forceToGlobalQueue);
}
}
As you can see the execution of the task is scheduled on the ThreadPool anyway. The ThreadPool.UnsafeQueueCustomWorkItem is an internal method of the ThreadPool class, and has some nuances (bool forceGlobal) that are not publicly exposed. But there is nothing in it that changes the behavior of the ThreadPool when it becomes saturated¹. This behavior, currently, is not particularly sophisticated either. The thread-injection algorithm just injects one new thread in the pool every 500 msec, until the saturation incident ends.
¹ The ThreadPool is said to be saturated when the demand for work surpasses the current availability of threads, and the threshold SetMinThreads above which new threads are no longer created on demand has been reached.