Should we use async/await for downloading huge chunk of data - c#

As per my understanding async/await will use ThreadPool thread for performing asynchronous operation and we prefer Threadpool thread when operation will be done within shorter span of time, so threadpool threads will be free early.
So if we use async/await or Task for downloading huge amount of data, then whether it will impact on application performance since threadpool thread will not be free early and Threadpool will have to create new thread(which is expensive operation).
One more thing, if async/await is not preferable in above scenario, what should be alternative to download huge amount of data?? Should we create new thread explicitly.
Please share your thought and thanks in advance.....:):)

Async IO does not use threads while it runs. That's the point.
Async IO does not make an IO faster. It only changes the way it is started and completed. It will gain you zero performance for your big file download.

Some correction, According to the document and as I have explained ThreadPools do not have the overhead of creating threads. Hence it provide the advantage of avoiding "thread creating overhead and thread disposing".
Quoted :
Thread pools typically have a maximum number of threads. If all the threads are busy, additional tasks are put in queue until they can be serviced as threads become available.
So yes, having MANY downlaods simultaniously COULD outnumber available number of threads in ThreadPool
Finally your main question : Yes async/await is a good solution for a file download. A good tutorial i used sometime back.

You should absolutely use async/await in this case. Using async/await does not block the calling thread so it does not cause creation of new threads.
The async and await keywords don't cause additional threads to be created. Async methods don't require multithreading because an async method doesn't run on its own thread.
And you are asking about IO operation, using async/await is a perfect fit for this:
The async-based approach to asynchronous programming is preferable to
existing approaches in almost every case. In particular, this approach
is better than BackgroundWorker for IO-bound operations because the
code is simpler and you don't have to guard against race conditions.
The MSDN article has more details.

Related

Is Blocking code really expensive on modern systems?

I'm trying to grasp a bit better the concepts of async programming (mostly for C#) and blocking/non blocking code.
In C#, if I call .Wait() on a Task , is it always considered "blocking" ?
I understand that the current thread will be blocked. However the thread is put in a "waiting" state (AFAIK), and AFAIK it will never be scheduled by the OS until woken up when the Task completed (I assume the thread is woken up by kernel magic)
In that case, the CPU time taken by this blocking operation should be negligible during the waiting period. Is it indeed the case?
So where are the advantage of async programming coming from? Is it because it allows to go beyond 1000 or so threads that the OS wouldn't allow ? Is it because the memory overhead per async task is lower than the overhead of a thread?
Keep in mind that the "event loop" that manages all the tasks in async context also has work to do to manage the scheduling of all async tasks, bookeeping etc. Is it really less work than what the kernerl has to do in the blocking case to manage threads?
Wait() will block your thread the same as calling a non-async I/O.
Blocking is not inherently inefficient. In fact, it can be more performant if you have a process that will have very few threads. Windows' scheduler actually has some interesting special designs for I/O-blocked threads which you can read about in the Windows Internals books, such as boosting a thread to front of the line if it's been waiting on an I/O for a long time.
However, it doesn't scale. Every thread you create has overhead: memory for stack and register space, thread-local storage used by your app and inside of .NET, cache thrashing caused by all the extra memory needed, context switching, and so on. It's generally not going to be an efficient use of resources especially when each thread will spend a majority of its time blocked.
Async takes advantage of the fact that conceptually we don't really need everything a thread has to offer -- we only want concurrency, so we can make more domain-relevant optimizations in how we use our resources.
It rarely hurts a project to be async by default. If your app doesn't need to be hyper-optimized for scalability, it won't hurt or help you. If your app does, then it'll be a huge help. Things like async/await can just help you model your concurrency better, so regardless of your perf goals it can be useful.
Async I/O is moving towards an even cooler place: I/O APIs like Windows RIO and Linux's io_uring allow you to do I/O without even context switching. Currently .NET does not take advantage of these things, but PipeWriter and PipeReader were built with it in mind for the future.

why use ThreadPool has advantages over thread-based approach?

My textbook says:
should you need to run hundreds or thousands of concurrent I/O-bound
operations, a thread-based approach consumes hundreds or thousands of
MB of memory purely in thread overhead.
and
when running multiple long-running tasks in parallel, performance can
suffer, then a thread-based approach is better
I'm confused, what's the difference between a non-threadpool thread's overhead and threadpool thread's? How overhead related to I/O-bound?
And finally, why thread-based approach (for example, use new Thread(runMethod).Start()) is better for long-running tasks?
ThreadPool has a limited number of reusable threads. This threads are used for tasks (e.g. Task.Run). A task that executes for a longer period of time would block a thread so that it couldn't be reused for another Task. So in order to always have enough ThreadPool threads available (e.g. for async/await, Parallel Linq etc.), you should use ThreadPool independent threads for this kind of tasks.
You do this by using the Task.Factory.StartNew(Action, TaskCreationOptions) (or any other overload that accepts a TaskCreationOptions object) and then pass in the parameter TaskCreationOptions.LongRunning. LongRunning forces a new thread that is independent from the ThreadPool.
So for all long running and IO based tasks, like reading a file or database, you are supposed to use ThreadPool independent threads by calling Task.Factory.StartNew(() => DoAction(), TaskCreationOptions.LongRunning);. You don't need new Thread(runMethod).Start() at all.
ThreadPool threads are more resource efficient since they are reusable. So when retrieving ThreadPool threads, they are already created. Creating new threads is always resource expensive. They need to be registered, call stacks must be created, locals must be copied, etc. This is why when considering performance, reusable threads are preferable choice, as long as the workload is lightweight (short running).

Why Use Async/Await Over Normal Threading or Tasks?

I've been reading a lot about async and await, and at first I didn't get it because I didn't properly understand threading or tasks. But after getting to grips with both I wonder: why use async/await if you're comfortable with threads?
The asynchronousy of async/await can be done with Thread signaling, or Thread.Join() etc. Is it merely for time saving coding and "less" hassle?
Yes, it is a syntactic sugar that makes dealing with threads much easier, it also makes the code easier to maintain, because the thread management is done by run-time. await release the thread immediately and allows that thread or another one to pick up where it left off, even if done on the main thread.
Like other abstractions, if you want complete control over the mechanisms under the covers, then you are still free to implement similar logic using thread signaling, etc.
If you are interested in seeing what async/await produces then you can use Reflector or ILSpy to decompile the generated code.
Read What does async & await generate? for a description of what C# 5.0 is doing on your behalf.
If await was just calling Task.Wait we wouldn't need special syntax and new APIs for that. The major difference is that async/await releases the current thread completely while waiting for completion. During an async IO there is no thread involved at all. The IO is just a small data structure inside of the kernel.
async/await uses callback-based waiting under the hood and makes all its nastiness (think of JavaScript callbacks...) go a way.
Note, that async does not just move the work to a background thread (in general). It releases all threads involved.
Comparing async and await with threads is like comparing apples and pipe wrenches. From 10,000 feet they may look similar, but they are very different solutions to very different problems.
async and await are all about asynchronous programming; specifically, allowing a method to pause itself while it's waiting for some operation. When the method pauses, it returns to its caller (usually returning a task, which is completed when the method completes).
I assume you're familiar with threading, which is about managing threads. The closest parallel to a thread in the async world is Task.Run, which starts executing some code on a background thread and returns a task which is completed when that code completes.
async and await were carefully designed to be thread-agnostic. So they work quite well in the UI thread of WPF/Win8/WinForms/Silverlight/WP apps, keeping the UI thread responsive without tying up thread pool resources. They also work quite well in multithreaded scenarios such as ASP.NET.
If you're looking for a good intro to async/await, I wrote up one on my blog which has links to additional recommended reading.
There is a difference between the Threads and async/await feature.
Think about a situation, where you are calling a network to get some data from network. Here the Thread which is calling the Network Driver (probably running in some svchost process) keeps itself blocked, and consumes resources.
In case of Async/await, if the call is not network bound, it wraps the entire call into a callback using SynchronizationContext which is capable of getting callback from external process. This frees the Thread and the Thread will be available for other things to consume.
Asynchrony and Concurrency are two different thing, the former is just calling something in async mode while the later is really cpu bound. Threads are generally better when you need concurrency.
I have written a blog long ago describing these features .
C# 5.0 vNext - New Asynchronous Pattern
async/await does not use threads; that's one of the big advantages. It keeps your application responsive without the added complexity and overhead inherent in threads.
The purpose is to make it easy to keep an application responsive when dealing with long-running, I/O intensive operations. For example, it's great if you have to download a bunch of data from a web site, or read files from disk. Spinning up a new thread (or threads) is overkill in those cases.
The general rule is to use threads via Task.Run when dealing with CPU-bound operations, and async/await when dealing with I/O bound operations.
Stephen Toub has a great blog post on async/await that I recommend you read.

The best way to implement massively parallel application in c#

I'm working on a network-bound application, which is supposed to have a lot (hundreds, may be thousands) of parallel processes.
I'm looking for the best way to implement it.
When I tried setting
ThreadPool.SetMaxThreads(int.MaxValue, int.MaxValue);
and than creating 1000 threads and making those do stuff in parallel, application's execution became really jumpy.
I've heard somewhere that delegate.BeginInvoke is somehow better that new Thread(...), so I've tried it, and than opened the app in debugger, and what I've seen are parallel threads.
If I have to create lots and lots of threads, what is the best way to ensure that the application is going to run smoothly?
Have you tried the new await / async pattern in C# 5 / .NET 4.5?
I haven't got sources to hand about how this operates under the hood, but one of the most common use-cases of this new feature is waiting for IO bound stuff.
Threads are not lightweight objects. They are expensive to create and context switch to/from; hence the reason for the Thread Pool (pre-created and recycled). Most common solutions that involve networking or other IO ports utilise lower-level IO Completion Ports (there is a managed library here) to "wait" on a port, but where the thread can continue executing as normal.
BeginInvoke will utilise a Thread Pool thread, so it will be better than creating your own only if a thread is available. This approach, if used too heavily, can immediately result in thread starvation.
Setting such a high thread pool count is not going to work in the long run as threads are too heavy for what it appears you want to do.
Axum, a former Microsoft Research language, used to achieve massive parallelism that would have been suitable for this task. It operated similarly to Stackless Python or Erlang. Lots of concepts from Axum made their way into the parallelism drive into C# 5 and .NET 4.5.
Setting the ThreadPool.SetMaxThreads will only affect how many threads the thread pool has, and it won't make a difference regarding threads you create yourself with new Thread().
Go async (model, not keyword) as suggested by many.
You should follow the advice mentioned in the other answers and comments. As fsimonazzi says, creating new threads directly has nothing to do with the ThreadPool. For a quick test lower the max worker and completionPort threads and use the ThreadPool.QueueUserWorkItem method. The ThreadPool will decide what your system can handle, queue your tasks and resuse threads whenever it can.
If your tasks are not compute-bound then you should also utilize asynchronous I/O. You do not your worker threads to wait for I/O completion. You need those worker threads to return to the pool as quickly as possible and not block on I/O requests.

C# - ThreadPool vs Tasks

As some may have seen in .NET 4.0, they've added a new namespace System.Threading.Tasks which basically is what is means, a task. I've only been using it for a few days, from using ThreadPool.
Which one is more efficient and less resource consuming? (Or just better overall?)
The objective of the Tasks namespace is to provide a pluggable architecture to make multi-tasking applications easier to write and more flexible.
The implementation uses a TaskScheduler object to control the handling of tasks. This has virtual methods that you can override to create your own task handling. Methods include for instance
protected virtual void QueueTask(Task task)
public virtual int MaximumConcurrencyLevel
There will be a tiny overhead to using the default implementation as there's a wrapper around the .NET threads implementation, but I'd not expect it to be huge.
There is a (draft) implementation of a custom TaskScheduler that implements multiple tasks on a single thread here.
which one is more efficient and less
resource consuming?
Irrelevant, there will be very little difference.
(Or just better overall)
The Task class will be the easier-to-use as it offers a very clean interface for starting and joining threads, and transfers exceptions. It also supports a (limited) form of load balancing.
"Starting with the .NET Framework 4, the TPL is the preferred way to write multithreaded and parallel code."
http://msdn.microsoft.com/en-us/library/dd460717.aspx
Thread
The bare metal thing, you probably don't need to use it, you probably can use a LongRunning Task and benefit from its facilities.
Tasks
Abstraction above the Threads. It uses the thread pool (unless you specify the task as a LongRunning operation, if so, a new thread is created under the hood for you).
Thread Pool
As the name suggests: a pool of threads. Is the .NET framework handling a limited number of threads for you. Why? Because opening 100 threads to execute expensive CPU operations on a CPU with just 8 cores definitely is not a good idea. The framework will maintain this pool for you, reusing the threads (not creating/killing them at each operation), and executing some of they in parallel in a way that your CPU will not burn.
OK, but when to use each one?
In resume: always use tasks.
Task is an abstratcion, so it is a lot easier to use. I advise you to always try to use Tasks and if you face some problem that makes you need to handle a thread by yourself (probably 1% of the time) then use threads.
BUT be aware that:
I/O Bound: For I/O bound operations (database calls, read/write files, APIs calls, etc) never use normal tasks, use LongRunning tasks or threads if you need to, but not normal tasks. Because it would lead you to a thread pool with a few threads busy and a lot of another tasks waiting for its turn to take the pool.
CPU Bound: For CPU bound operations just use the normal tasks and be happy.
Scheduling is an important aspect of parallel tasks.
Unlike threads, new tasks don't necessarily begin executing immediately. Instead, they are placed in a work queue. Tasks run when their associated task scheduler removes them from the queue, usually as cores become available. The task scheduler attempts to optimize overall throughput by controlling the system's degree of concurrency. As long as there are enough tasks and the tasks are sufficiently free of serializing dependencies, the program's performance scales with the number of available cores. In this way, tasks embody the concept of potential parallelism
As I saw on msdn http://msdn.microsoft.com/en-us/library/ff963549.aspx
ThreadPool and Task difference is very simple.
To understand task you should know about the threadpool.
ThreadPool is basically help to manage and reuse the free threads. In
other words a threadpool is the collection of background thread.
Simple definition of task can be:
Task work asynchronously manages the the unit of work. In easy words
Task doesn’t create new threads. Instead it efficiently manages the
threads of a threadpool.Tasks are executed by TaskScheduler, which queues tasks onto threads.
Another good point to consider about task is, when you use ThreadPool, you don't have any way to abort or wait on the running threads (unless you do it manually in the method of thread), but using task it is possible. Please correct me if I'm wrong

Categories