Processing multiple inputs in the background in C# .NET4 application - c#

I am looking for an appropriate pattern and best modern way to solve the following problem:
My application is expecting inputs from multiple sources, for example: GUI, monitoring file-system, voice command, web request, etc. When an input is received I need to send it to some ProcessInput(InputData arg) method that would start processing the data in the background, without blocking the application to receive and process more data, and in some way return some results whenever the processing is complete. Depending on the input, the processing can take significantly different amounts of time. For starters I don't need the ability to check the progress or cancel the processing.
After reading a dozen of articles on MSDN and blogposts of some rock-star programmers I am really confused what pattern should be used here, and more importantly which features of .NET
My findings are:
ThreadPool.QueueUserWorkItem - easiest to understand, not very convinient about returning the results
BackgroundWorker - seems to be used only only for rather simple tasks, all workers run on single thread?
Event-based Asynchronous Pattern
Tasks in Task Parallel Library
C# 5 async/await - these seem to be shortcuts for Tasks from Task Parallel
Notes:
Performance is important, so taking advantage of multi-core system when possible would be really nice.
This is not a web application.
My problem reminds me of a TCP server(really any sort of server) where application is constantly listening for new connections/data on multiple sockets, I found the article Asynchronous Server Socket and I am curious if that pattern could be a possible solution for me.

My application is expecting inputs from multiple sources, for example: GUI, monitoring file-system, voice command, web request, etc.
I've done a whole lot of asynchronous programming in my time. I find it useful to distinguish between background operations and asynchronous events. A "background operation" is something that you initiate, and some time later it completes. An "asynchronous event" is something that's always going on independent of your program; you can subscribe, receive the events for a time, and then unsubscribe.
So, GUI inputs and file-system monitoring would be examples of asynchronous events; whereas web requests are background operations. Background operations can also be split into CPU-bound (e.g., processing some input in a pipeline) and I/O-bound (e.g., web request).
I make this distinction especially in .NET because different approaches have different strengths and weaknesses. When doing your evaluations, you also need to take into consideration how errors are propogated.
First, the options you've already found:
ThreadPool.QueueUserWorkItem - almost the worst option around. It can only handle background operations (no events), and doesn't handle I/O-bound operations well. Returning results and errors are both manual.
BackgroundWorker (BGW) - not the worst, but definitely not the best. It also only handles background operations (no events), and doesn't handle I/O-bound operations well. Each BGW runs in its own thread - which is bad, because you can't take advantage of the work-stealing self-balancing nature of the thread pool. Furthermore, the completion notifications are (usually) all queued to a single thread, which can cause a bottleneck in very busy systems.
Event-Based Asynchronous Pattern (EAP) - This is the first option from your list that would support asynchronous events as well as background operations, and it also can efficiently handle I/O-bound operations. However, it's somewhat difficult to program correctly, and it has the same problem as BGW where completion notifications are (usually) all queued to a single thread. (Note that BGW is the EAP applied to CPU-bound background operations). I wrote a library to help in writing EAP components, along with some EAP-based sockets. But I do not recommend this approach; there are better options available these days.
Tasks in Task Parallel Library - Task is the best option for background operations, both CPU-bound and I/O-bound. I review several background operation options on my blog - but that blog post does not address asychronous events at all.
C# 5 async/await - These allow a more natural expression of Task-based background operations. They also offer an easy way to synchronize back to the caller's context if you want to (useful for UI-initiated operations).
Of these options, async/await are the easiest to use, with Task a close second. The problem with those is that they were designed for background operations and not asynchronous events.
Any asynchronous event source may be consumed using asynchronous operations (e.g., Task) as long as you have a sufficient buffer for those events. When you have a buffer, you can just restart the asynchronous operation each time it completes. Some buffers are provided by the OS (e.g., sockets have read buffers, UI windows have message queues, etc), but you may have to provide other buffers yourself.
Having said that, here's my recommendations:
Task-based Asynchronous Pattern (TAP) - using either await/async or Task directly, use TAP to model at least your background operations.
TPL Dataflow (part of VS Async) - allows you to set up "pipelines" for data to travel through. Dataflow is based on Tasks. The disadvantage to Dataflow is that it's still developing and (IMO) not as stable as the rest of the Async support.
Reactive Extensions (Rx) - this is the only option that is specifically designed for asynchronous events, not just background operations. It's officially released (unlike VS Async and Dataflow), but the learning curve is steeper.
All three of these options are efficient (using the thread pool for any actual processing), and they all have well-defined semantics for error handling and results. I do recommend using TAP as much as possible; those parts can then easily be integrated into Dataflow or Rx.
You mentioned "voice commands" as one possible input source. You may be interested in a BuildWindows video where Stephen Toub sings -- and uses Dataflow to harmonize his voice in near-realtime. (Stephen Toub is one of the geniuses behind TPL, Dataflow, and Async).

IMO using a thread pool is the way to go WRT processing the input. Take a look at http://smartthreadpool.codeplex.com. It provides a very nice API (using generics) for waiting on results. You could use this in conjunction with Asynchronous Server Socket implementation. It may also be worth your while to take a look at Jeff Richter's Power Threading Lib: http://www.wintellect.com/Resources/Downloads

I am by no means expert in theese matters but I did some research on the subject recently and I'm very pleased with results achieved with MS TPL library. Tasks give you a nice wrapper around ThreadPool threads and are optimized for parallel processing so they ensure more performance. If you are able to use .NET 4.0 for your project, you should probably explore using tasks. They represent more advanced way of dealing with async operations and provide a nice way to cancel operations in progress using CancellationToken objects.
Here is the short example of accessing UI thread from different thread using tasks:
private void TaskUse()
{
var task = new Task<string>(() =>
{
Thread.Sleep(5000);
return "5 seconds passed!";
});
task.ContinueWith((tResult) =>
{
TestTextBox.Text = tResult.Result;
}, TaskScheduler.FromCurrentSynchronizationContext());
task.Start();
}
From previous example you can see how easy is to synchronize with UI thread with using TaskScheduler.FromCurrentSynchronizationContext(), assuming you call this method from UI thread. Tasks also provide optimizations for blocking operations like scenarios where you need to wait for service response and such by providing TaskCreationOptions.LongRunning enum value in Task constructor. This will assure that specified operation doesn't block processor core since maximum number of active tasks is determined by number of present processor cores.

Related

Async-Await vs ThreadPool vs MultiThreading on High-Performance Sockets (C10k Solutions?)

I'm really confused about async-awaits, pools and threads. The main problem starts with this question: "What can I do when I have to handle 10k socket I/O?" (aka The C10k Problem).
First, I tried to make a custom pooling architecture with threads
that uses one main Queue and multiple Threads to process all
incoming datas. It was a great experience about understanding
thread-safety and multi-threading but thread is an overkill
with async-await nowadays.
Later, I implemented a simple architecture with async-await but I
can't understand why "The async and await keywords don't cause
additional threads to be created." (from MSDN)? I think there
must be some threads to do jobs like BackgroundWorker.
Finally, I implemented another architecture with ThreadPool and it
looks like my first custom pooling.
Now, I think there should be someone else with me who confused about handling The C10k. My project is a dedicated (central) server for my game project that is hub/lobby server like MCSG's lobbies or COD's matchmaking servers. I'll do the login operations, game server command executions/queries and information serving (like version, patch).
Last part might be more specific about my project but I really need some good suggestions about real world solutions about multiple (heavy) data handling.
(Also yes, 1k-10k-100k connection handling depending on server hardware but this is a general question)
The key point: Choosing Between the Task Parallel Library and the ThreadPool (MSDN Blog)
[ADDITIONAL] Good (basic) things to read who wants to understand what are we talking about:
Threads
Async, Await
ThreadPool
BackgroundWorker
async/await is roughly analogous to the "Serve many clients with each thread, and use asynchronous I/O and completion notification" approach in your referenced article.
While async and await by themselves do not cause any additional threads, they will make use of thread pool threads if an async method resumes on a thread pool context. Note that the async interaction with ThreadPool is highly optimized; it is very doubtful that you can use Thread or ThreadPool to get the same performance (with a reasonable time for development).
If you can, I'd recommend using an existing protocol - e.g., SignalR. This will greatly simplify your code, since there are many (many) pitfalls to writing your own TCP/IP protocol. SignalR can be self-hosted or hosted on ASP.NET.
No. If we use asynchronous programming pattern that .NET introduced in 4.5, in most of the cases we need not to create manual thread by us. The compiler does the difficult work that the developer used to do. Creating a new thread is costly, it takes time. Unless we need to control a thread, then “Task-based Asynchronous Pattern (TAP)” and “Task Parallel Library (TPL)” is good enough for asynchronous and parallel programming. TAP and TPL uses Task. In general Task uses the thread from ThreadPool(A thread pool is a collection of threads already created and maintained by .NET framework. If we use Task, most of the cases we need not to use thread pool directly. A thread can do many more useful things. You can read more about Thread Pooling
You can avoid performance bottlenecks and enhance the overall responsiveness of your application by using asynchronous programming. Asynchrony is essential for activities that are potentially blocking, such as when your application accesses the web. Access to a web resource sometimes is slow or delayed. If such an activity is blocked within a synchronous process, the entire application must wait. In an asynchronous process, the application can continue with other work that doesn't depend on the web resource until the potentially blocking task finishes.
Await is specifically designed to deal with something taking time, most typically an I/O request. Which traditionally was done with a callback when the I/O request was complete. Writing code that relies on these callbacks is quite difficult, await greatly simplifies it. Await just takes care of dealing with the delay, it doesn't otherwise do anything that a thread does. The await expression, what's at the right of the await keyword, is what gets the job done. You can use Async with any method that returns a Task. The XxxxAsync() methods are just precooked ones in the .NET framework for common operations that take time. Like downloading data from a web server.
I would recommend you to read Asynchronous Programming with Async and Await

Why Use Async/Await Over Normal Threading or Tasks?

I've been reading a lot about async and await, and at first I didn't get it because I didn't properly understand threading or tasks. But after getting to grips with both I wonder: why use async/await if you're comfortable with threads?
The asynchronousy of async/await can be done with Thread signaling, or Thread.Join() etc. Is it merely for time saving coding and "less" hassle?
Yes, it is a syntactic sugar that makes dealing with threads much easier, it also makes the code easier to maintain, because the thread management is done by run-time. await release the thread immediately and allows that thread or another one to pick up where it left off, even if done on the main thread.
Like other abstractions, if you want complete control over the mechanisms under the covers, then you are still free to implement similar logic using thread signaling, etc.
If you are interested in seeing what async/await produces then you can use Reflector or ILSpy to decompile the generated code.
Read What does async & await generate? for a description of what C# 5.0 is doing on your behalf.
If await was just calling Task.Wait we wouldn't need special syntax and new APIs for that. The major difference is that async/await releases the current thread completely while waiting for completion. During an async IO there is no thread involved at all. The IO is just a small data structure inside of the kernel.
async/await uses callback-based waiting under the hood and makes all its nastiness (think of JavaScript callbacks...) go a way.
Note, that async does not just move the work to a background thread (in general). It releases all threads involved.
Comparing async and await with threads is like comparing apples and pipe wrenches. From 10,000 feet they may look similar, but they are very different solutions to very different problems.
async and await are all about asynchronous programming; specifically, allowing a method to pause itself while it's waiting for some operation. When the method pauses, it returns to its caller (usually returning a task, which is completed when the method completes).
I assume you're familiar with threading, which is about managing threads. The closest parallel to a thread in the async world is Task.Run, which starts executing some code on a background thread and returns a task which is completed when that code completes.
async and await were carefully designed to be thread-agnostic. So they work quite well in the UI thread of WPF/Win8/WinForms/Silverlight/WP apps, keeping the UI thread responsive without tying up thread pool resources. They also work quite well in multithreaded scenarios such as ASP.NET.
If you're looking for a good intro to async/await, I wrote up one on my blog which has links to additional recommended reading.
There is a difference between the Threads and async/await feature.
Think about a situation, where you are calling a network to get some data from network. Here the Thread which is calling the Network Driver (probably running in some svchost process) keeps itself blocked, and consumes resources.
In case of Async/await, if the call is not network bound, it wraps the entire call into a callback using SynchronizationContext which is capable of getting callback from external process. This frees the Thread and the Thread will be available for other things to consume.
Asynchrony and Concurrency are two different thing, the former is just calling something in async mode while the later is really cpu bound. Threads are generally better when you need concurrency.
I have written a blog long ago describing these features .
C# 5.0 vNext - New Asynchronous Pattern
async/await does not use threads; that's one of the big advantages. It keeps your application responsive without the added complexity and overhead inherent in threads.
The purpose is to make it easy to keep an application responsive when dealing with long-running, I/O intensive operations. For example, it's great if you have to download a bunch of data from a web site, or read files from disk. Spinning up a new thread (or threads) is overkill in those cases.
The general rule is to use threads via Task.Run when dealing with CPU-bound operations, and async/await when dealing with I/O bound operations.
Stephen Toub has a great blog post on async/await that I recommend you read.

Asynchronous methods(!) clarification in .net?

I've been reading a lot lately about this topic and , still I need to clarify something
The whole idea with asynchronous methods is Thread economy :
Allow many tasks to run on a few threads. this is done by using the hardware driver to do the job while releasing the thread back to the thread-pool so it can server other jobs.
please notice .
I'm not talking about asynchronous delegates which ties another thread (execute a task in parallel with the caller).
However I've seen 2 main types of asynchronous methods examples :
Code samples (from books) who only uses existing I/O asynchronous operations as beginXXX / endXX e.g. Stream.BeginRead.
And I couldn't find any asynchronous methods samples which don't use existing .net I/O operations e.g. Stream.BeginRead )
Code samples like this (and this). which doesnt actually invoking an asynchronous operation (although the author thinks he is - but he actually causes a thread to block !)
Question :
Does asynchronous methods are used only with .net I/O existing methods like BeginXXX , EndXXX ?
I mean , If I want to create my own asynchronous methods like BeginMyDelay(int ms,...){..} , EndMyDelay(...). I couldn't done it without tie a blocked thread to it....correct?
Thank you very much.
p.s. please notice this question is tagged as .net 4 and not .net4.5
You're talking about APM.
APM widely uses OS concept, known as IO Completion ports. That's why different IO operations are the best candidates to use APM.
You could write your own APM methods.
But, in fact, these methods will be either over existing APM methods, or they will be IO-bound, and will use some native OS mechanism (like FilesStream, which uses overlapped file IO).
For compute-bound asynchronous operations APM only will increase complexity, IMO.
A bit more clarification.
Work with hardware is asynchronous by its nature. Hardware needs a time to perform request - newtork card must send or receive data, HDD must read/write etc. If IO is synchronous, thread, which was generated IO request, is waiting for response. And here APM helps - you shouldn't wait, just execute something else, and when IO will be complete, I'll call you, says APM.
The main point - operation is performing outside of CPU.
When you're writing any compute-bound operation, which will use CPU for it execution without any IO, there's nothing to wait here. So, APM coludn't help - if you need CPU, you need thread - you need thread pool.
I think, but I'm not sure, that you can create your own asynchronous methods. For example creating a new thread and wait for it to finish some work (db query, ...).
In term of overall system performance probably it is not useful, as you say you just create another thread. But for example if you work on IIS, the original request thread can be used for other requests while you are waiting for the 'background' operation.
I think that IIS has a fixed number of threads (thread pool), so in this case can be useful.
I mean , If I want to create my own asynchronous methods like
BeginMyDelay(int ms,...){..} , EndMyDelay(...). I couldn't done it
without tie a blocked thread to it....correct?
While I've not dug into the implementation of async, I can't see any reason why one couldn't do this.
The simplest way would be to use existing libraries that help [e.g. timers] or some sort of event system IIRC.
However even if you don't want to use any library helpers then you're stuck with a problem... the 'blocked thread'.
Sure the code does look something like this:
while (true){
foreach (var item in WaitingTasks)
if (item.Ready())
/*fire item, and remove it from tasks*/;
/*Some blocking action*/
}
Thing is - 'Some blocking action' doesn't have to be 'blocking'. You could yield/sleep the thread, or use it to process some data. For example, the Unity Game Engine does a similar thing with Coroutines - where the same thread that processes all the code also checks to see if various coroutines [that have been delayed due to time] need to be updated. Replace /*Some blocking action*/ with ProcessGameLoop().
Hoe that helps, feel free to ask questions/post corrections etc.

Which threads work method I need to use?

I have audio player application (c# .NET 4.0 WPF) that gets an audio-stream from the web and plays it. The app also displays waveforms and spectrums and saves the audio to local disk. It also does a few more things.
My quetion is when I recive a new byte packet from the web and I need to play them (and maybe write them to local disk etc.), do I need use threads? I try to do all the things with the main thread and it seems to work well.
I can work with threadpool for every bytes packet that I received in my connection. Would this be a reasonable approach?
For this you can use the Task Parallel Library (TPL). The Task Parallel Library (TPL) is a set of public types and APIs in the System.Threading and System.Threading.Tasks namespaces in the .NET Framework version 4. The purpose of the TPL is to make developers more productive by simplifying the process of adding parallelism and concurrency to applications. The TPL scales the degree of concurrency dynamically to most efficiently use all the processors that are available. In addition, the TPL handles the partitioning of the work, the scheduling of threads on the ThreadPool, cancellation support, state management, and other low-level details.
Another option (if the operations you were performing were sufficiently long running) is the BackgroundWorker class. The BackgroundWorker component gives you the ability to execute time-consuming operations asynchronously ("in the background"), on a thread different from your application's main UI thread. To use a BackgroundWorker, you simply tell it what time-consuming worker method to execute in the background, and then you call the RunWorkerAsync method. Your calling thread continues to run normally while the worker method runs asynchronously. When the method is finished, the BackgroundWorker alerts the calling thread by firing the RunWorkerCompleted event, which optionally contains the results of the operation. This may not be the best option for you if you have many operations to undertake sequentially.
The next alternative that has been largely replaced by the TPL, is the Thread Class. This is not so easy to use at the TPL and you can do everything using the TPL as you can using the Thread Class (well almost) and the TPL is much more user friendly.
I hope this helps.
I suggest using 2 threads: in one you are downloading packets from web and putting them in queue(it can be UI thread if you are using async download operation) and in another thread you are analyzing queue and processing packets from it.

Design Pattern Alternative to Coroutines

Currently, I have a large number of C# computations (method calls) residing in a queue that will be run sequentially. Each computation will use some high-latency service (network, disk...).
I was going to use Mono coroutines to allow the next computation in the computation queue to continue while a previous computation is waiting for the high latency service to return. However, I prefer to not depend on Mono coroutines.
Is there a design pattern that's implementable in pure C# that will enable me to process additional computations while waiting for high latency services to return?
Thanks
Update:
I need to execute a huge number (>10000) of tasks, and each task will be using some high-latency service. On Windows, you can't create that much threads.
Update:
Basically, I need a design pattern that emulates the advantages (as follows) of tasklets in Stackless Python (http://www.stackless.com/)
Huge # of tasks
If a task blocks the next task in the queue executes
No wasted cpu cycle
Minimal overhead switching between tasks
You can simulate cooperative microthreading using IEnumerable. Unfortunately this won't work with blocking APIs, so you need to find APIs that you can poll, or which have callbacks that you can use for signalling.
Consider a method
IEnumerable Thread ()
{
//do some stuff
Foo ();
//co-operatively yield
yield null;
//do some more stuff
Bar ();
//sleep 2 seconds
yield new TimeSpan (2000);
}
The C# compiler will unwrap this into a state machine - but the appearance is that of a co-operative microthread.
The pattern is quite straightforward. You implement a "scheduler" that keeps a list of all the active IEnumerators. As it cycles through the list, it "runs" each one using MoveNext (). If the value of MoveNext is false, the thread has ended, and the scheduler removes it from the list. If it's true, then the scheduler accesses the Current property to determine the current state of the thread. If it's a TimeSpan, the thread wishes to sleep, and the scheduler moved it onto some queue that can be flushed back into the main list when the sleep timespans have ended.
You can use other return objects to implement other signalling mechanisms. For example, define some kind of WaitHandle. If the thread yields one of these, it can be moved to a waiting queue until the handle is signalled. Or you could support WaitAll by yielding an array of wait handles. You could even implement priorities.
I did a simple implementation of this scheduler in about 150LOC but I haven't got round to blogging the code yet. It was for our PhyreSharp PhyreEngine wrapper (which won't be public), where it seems to work pretty well for controlling a couple of hundred characters in one of our demos. We borrowed the concept from the Unity3D engine -- they have some online docs that explain it from a user point of view.
.NET 4.0 comes with extensive support for Task parallelism:
How to: Use Parallel.Invoke to Execute Simple Parallel Tasks
How to: Return a Value from a Task
How to: Chain Multiple Tasks with Continuations
I'd recommend using the Thread Pool to execute multiple tasks from your queue at once in manageable batches using a list of active tasks that feeds off of the task queue.
In this scenario your main worker thread would initially pop N tasks from the queue into the active tasks list to be dispatched to the thread pool (most likely using QueueUserWorkItem), where N represents a manageable amount that won't overload the thread pool, bog your app down with thread scheduling and synchronization costs, or suck up available memory due to the combined I/O memory overhead of each task.
Whenever a task signals completion to the worker thread, you can remove it from the active tasks list and add the next one from your task queue to be executed.
This will allow you to have a rolling set of N tasks from your queue. You can manipulate N to affect the performance characteristics and find what is best in your particular circumstances.
Since you are ultimately bottlenecked by hardware operations (disk I/O and network I/O, CPU) I imagine smaller is better. Two thread pool tasks working on disk I/O most likely won't execute faster than one.
You could also implement flexibility in the size and contents of the active task list by restricting it to a set number of particular type of task. For example if you are running on a machine with 4 cores, you might find that the highest performing configuration is four CPU-bound tasks running concurrently along with one disk-bound task and a network task.
If you already have one task classified as a disk IO task, you may choose to wait until it is complete before adding another disk IO task, and you may choose to schedule a CPU-bound or network-bound task in the meanwhile.
Hope this makes sense!
PS: Do you have any dependancies on the order of tasks?
You should definitely check out the Concurrency and Coordination Runtime. One of their samples describes exactly what you're talking about: you call out to long-latency services, and the CCR efficiently allows some other task to run while you wait. It can handle huge number of tasks because it doesn't need to spawn a thread for each one, though it will use all your cores if you ask it to.
Isn't this a conventional use of multi-threaded processing?
Have a look at patterns such as Reactor here
Writing it to use Async IO might be sufficient.
This can lead to nasy, hard to debug code without strong structure in the design.
You should take a look at this:
http://www.replicator.org/node/80
This should do exactly what you want. It is a hack, though.
Some more information about the "Reactive" pattern (as mentioned by another poster) with respect to an implementation in .NET; aka "Linq to Events"
http://themechanicalbride.blogspot.com/2009/07/introducing-rx-linq-to-events.html
-Oisin
In fact, if you use one thread for a task, you will lose the game. Think about why Node.js can support huge number of conections. Using a few number of thread with async IO!!! Async and await functions can help on this.
foreach (var task in tasks)
{
await SendAsync(task.value);
ReadAsync();
}
SendAsync() and ReadAsync() are faked functions to async IO call.
Task parallelism is also a good choose. But I am not sure which one is faster. You can test both of them
in your case.
Yes of course you can. You just need to build a dispatcher mechanism that will call back on a lambda that you provide and goes into a queue. All the code I write in unity uses this approach and I never use coroutines. I wrap methods that use coroutines such as WWW stuff to just get rid of it. In theory, coroutines can be faster because there is less overhead. Practically they introduce new syntax to a language to do a fairly trivial task and furthermore you can't follow the stack trace properly on an error in a co-routine because all you'll see is ->Next. You'll have to then implement the ability to run the tasks in the queue on another thread. However, there is parallel functions in the latest .net and you'd be essentially writing similar functionality. It wouldn't be many lines of code really.
If anyone is interested I would send the code, don't have it on me.

Categories