Design of asynchronous socket classes in C#

Design of asynchronous socket classes in C# - c#

I've done an small asynchronous tcp server/client in C#...
... And I've been just thinking :
C# API implements select and epoll, a classic but easy way to do async. Why does Microsoft introduce the BeginConnect/BeginSend family, which -in my opinion- have a more complicated design (and adds lines of code too).
So, using the BeginXXX() "trend", I noticed that the System.Threading import is required (for the events). Does it mean that threads are involved too ?

select and poll have two problems:
They are generally used in a single-threaded way. They do not scale for this reason.
They require all IO to be dispatched through a central place that does the polling.
It is much nicer to be able to just specify callback that magically will be called on completion. This scales automatically and there is no central place to dispatch needed. Async IO in .NET is quite free of hassles. It just works (efficiently).
Async IO on Windows is threadless. While an IO is running not a single thread is busy serving it. All async IO in .NET uses truly async IO supported by the OS. This means either overlapped IO or completion ports.
Look into async/await which also can be used with sockets. They provide the easiest way to use async IO that I know of. That includes all languages and platforms. select and poll aren't even in the same league judged by ease of use.

Related

Asynchronous programming in Windows service, Is it relevant

In windows service, we do not have any blocking UI thread, so is it relevant to use Asynchronous programming inside windows service ??

The alternatives are to either block (i.e. do nothing until required data is available) or await (yield processing and then return automatically when the data is available).
In a situation when the program (a Windows service included) can do nothing further until the data arrives, there may seem little difference between the two, and as far as that program itself is concerned, this is true.
However, the program will be running in a thread allocated to it by the operating system (even if it is using only a single thread). Threads are not free resources and if a large number are in use, the OS will not hand out new ones until old ones terminate or become free. Thus other programs will be held up.
When a program blocks, it keeps hold of its thread, making it unavailable for use else where. When it awaits, the thread becomes available for others to use.
So using await will make the whole computer run more efficiently.

Async programming allows the efficient use of threads when they are executing blocking tasks. Blocking occurs in the ui but also when performing IO and therefore when communicating.
If your service does not perform heavy IO and does not use sockets and pipes, you won't have a benefit within the service; although I cannot image what such service could do.
Generally speaking, async programming produce also a benefit in the hosting system because it allows to globally use fewer resources to run your workload. However, you have to consider that async programming does not perform any resource sharing as said in other answers: your implementation will use your threads in a more efficient way (i.e. Task oriented), but you won't magically have more threads available.

The two things aren't related.
Most Windows services don't have a gui thread as they don't have a GUI. Instead they'll have a main thread, and probably many other child threads that implement the service. Any of these threads may want to mak use of asynchronous programing techniques. For example, they may be reading or writing over a socket, a classic example of using an asychronous programming model.

Must async methods be supported by OS or is async program level feature

Tutorials sometimes point implementation of own async methods as for example this code:
async public static Task GetHttpResponseAsync()
{
using (HttpClient httpClient = new HttpClient())
{
HttpResponseMessage response = await httpClient.GetAsync(...);
Console.WriteLine(response.Something);
}
}
what is clear to me is how async work generally, but none of tutorials explains how internally are implemented
httpClient.GetAsync(...);
which is really important to understand how asynchoronus code works in details. What make me curious is if internall operations of GetAsync (those method or other async method) are registered in some kind of container where this code is executed? Are async methods must be supported by operating system (f.e it uses windows api)? If I would like to implement my own asynchronous file downloader (from disk and without essential part from .NET framework), how would I implement it, should I register my method somewhere for further invocation?
It's pretty clear for me that internally, compiler makes state machine and after DoSomething() method do what it have to, it just invoke this state machine again to resume executing code after await.
Also what is unclear for me is that how async code can run on same thread. I think that maintaining state machine must be on the same thread but how the code from httpClient.GetAsync() can be run on the same thread and doesn't interrupt other operations (f.e gui). There must be something that make this code runs on separate thread (in all cases). Am I wrong? What I missed?
Additional explanation of my question: In JavaScript, as far as I know and understand, async methods works by registering them in some kind of container (which runs them one by one on separate thread) which executes this method. After execution of the method complete, result is returned to user context, It's clear for me, Is that work in the same way here?

In short, true asynchrony must be provided at the OS level. As you've noted, async/await is a language level feature that uses compiler generated state machines to "break up" your method into pieces that can run asynchronously, but it relies on OS primitives (interrupts, threads) to actually perform this work in an asynchronous manner.
However, it's important to note that there will often not be a thread created to handle your async operation. I'll defer to this expertly-written article to describe why this is the case: http://blog.stephencleary.com/2013/11/there-is-no-thread.html
On Windows, the primary mechanism for performing asynchronous work is with I/O Completion Ports. This is a Windows API that is used under the hood by many .NET types, including the HttpClient you're using.
Note that for non-I/O operations, you can also always use Threads, or better yet, the Thread Pool API, to perform background work that will complete asynchronously.
The ReadFile (or newer ReadFileEx) function in the Windows API is designed to work with async I/O. When you call ReadFile, you can use the FILE_FLAG_OVERLAPPED flag and pass an OVERLAPPED structure to the lpOverlapped argument, which enables async reads. I would encourage you to use this API to design your file downloader.
In summary:
There will not always be a thread created for async operations.
async/await is a language feature, but relies on various Windows APIs to achieve true asynchrony.
If the work you are doing is I/O bound, consider using Asynchronous I/O: https://msdn.microsoft.com/en-us/library/windows/desktop/aa365683(v=vs.85).aspx
If the work you are doing is CPU bound, consider using Threading: https://msdn.microsoft.com/en-us/library/windows/desktop/ms684841(v=vs.85).aspx
Now that .NET Core is open source on GitHub, you can actually inspect the source code to see what's going on under the hood. Here is HttpClient.cs, which uses HttpMessageHandler.cs -> HttpClientHandler.Windows.cs -> WinHttpHandler.cs -> Interop.winhttp.cs -> PInvoke into the WinHTTP API native DLL -> Winsock sockets with an I/O completion port.

C# Socket performance with .Net 4.5 Async vs [...]Async vs Begin[...]

Currently, from what I've researched, there are 3 ways to work with socket asynchronously:
.Net 4.5 Async example: Using .Net 4.5 Async Feature for Socket Programming (second post)
[...]Async: http://msdn.microsoft.com/en-us/library/system.net.sockets.socketasynceventargs.aspx
Begin[...]: http://msdn.microsoft.com/en-us/library/5w7b7x5f(v=vs.110).aspx
I am very confused with all the options .Net provides for working with asynchronous sockets. Why should I use one or the other? What better choice to have performance with thousands of simultaneous connections?

Methods using SocketAsyncEventArgs most closely match the underlying Windows technology (I/O Completion Ports). They are essentially a bare-metal wrapper designed to perform zero allocation and extract the highest performance at the cost of a less friendly API. This has a disadvantage of more tightly coupled code as it doesn't implement any standard Stream API. The other async socket methods all wrap this one.
Methods using a Begin/End pair are using what's called the Asynchronous Programming Model (APM). APM is the original async model of .NET. It's very easy to write spaghetti code if you use it half-cocked, but it's functional and fairly simple to use once you have some experience with it. They shouldn't see much use in modern .NET, though, because we've got something far easier and better performing:
Methods returning a Task are using the Task-based Asynchronous Pattern (TAP). Tasks are a pure upgrade to APM: they're more flexible, easier to compose, and should generally have equal or better performance. When combined with language-integrated async/await, you can write code that performs great and is significantly easier to understand and maintain.
tl;dr use Task methods, unless you've got a requirement of extreme perf. Then use SocketAsyncEventArgs methods. Don't use APM methods.

What better choice to have performance with thousands of simultaneous
connections?
...
A curiosity regarding the Begin[...]. If I have a MMORPG server where
one connection interacting with each other for position update,
animation, effects (basic MMORPG mechanism), in numbers, which would
be "heavily loaded servers"? 200~300 simultaneous connections?
On the server side, you may benefit equally well from using any asynchronous socket APIs, either Begin/End-style APM ones, event-based EAP ones or Task-based TAP ones. That's because you'll be blocking fewer threads, as opposed to using the synchronous APIs. So, more thread will be available to concurrently serve other incoming requests to your server, thus increasing its scalability.
Most likely, your won't see any performance advantage of using TAP socket APIs over their APM or EAP analogues. However, the TAP API pattern is so much easier to develop with than APM or EAP. When used with async/await, it produces shorter, more readable and less error-prone code. You get natural pseudo-linear code flow, which is not otherwise possible with APM callbacks or EAP event handlers. If you're unable find a proper Task-based socket API, you can always make one yourself from a Begin/End APM API with Task.FromAsync (or from an EAP API, check "A reusable pattern to convert event into task").
When it comes to a client side UI app, the scalability is not that important, but there's another benefit from the TAP pattern. With little efforts, it helps making your UI responsive, because you won't be blocking the UI thread (what usually happens while waiting for the result of a synchronous call). This is not specific to Task-based Socket API, it applies to any Task-based API, e.g, Task.Delay() or Stream.ReadAsync().
For some good reading materials on asynchronous programming in C#, check the async/await tag wiki:
https://stackoverflow.com/tags/async-await/info

If you have the chance of using .NET 4.5 and async/await, I totally recommend it.
Basically there are these ways of doing multithreading in .NET:
Thread.
ThreadPool.QueueWorkItem.
XXXAsync method and the XXXCompleted event.
BeginXXX and EndXXX methods.
Task Parallel Library.
async/await
The first one are raw threads, an option you should avoid because creating threads is a expensive operation. The rest, are just different ways of using the ThreadPool, that is a tool responsible of maintain a collection of threads that can be used to schedule your tasks, yielding a better performance than the first option.
The use different syntax's, but at to me, the most clear is async/await. I have created recently a WebSocket connector using sockets and asyn/await and the performance is quite good. Technically, async/await are not giving you a performance boost, but the clarity in the code will allow you to streamline the approach of your application, and that may give a good performance boost in comparison with a messy code based on continuations.

First, you might want to check out this article on MSDN about what the differences between the various async programming mechanisms in .NET are.
Begin[…] was the first async socket implementation, using APM (Asynchronous Programming Model). It takes a callback as one of its arguments. While somewhat dated compared to newer methods, this works fine if you don't mind dealing with callbacks and the messy code they can create. There's also some extra overhead associated with this because of the state object, and on heavily loaded servers this can start to become a problem.
[…]Async uses the newer event based model, and is also a lighter implementation to help deal with the high traffic issues Begin[…] has. This way works nicely, but can also result in messy code if you aren't careful. Oh yea, there's a bug you can read about here, though it's likely something you won't care about unless you're building a very performant piece of software.
Task based asynchronous programming (TPL) is the newest mechanism and, with the help of the async/await keywords, can have most (if not all) of the efficiency associated with […]Async while offering much easier to understand code. Also, with Tasks, it's much easier to wait on multiple operations to finish at a time. It's important note that, while there are several native .NET functions that implement TPL and return a Task, there isn't yet one for Socket operations. There are examples of how to do this online, but it requires a bit of extra work.

Asynchronous methods(!) clarification in .net?

I've been reading a lot lately about this topic and , still I need to clarify something
The whole idea with asynchronous methods is Thread economy :
Allow many tasks to run on a few threads. this is done by using the hardware driver to do the job while releasing the thread back to the thread-pool so it can server other jobs.
please notice .
I'm not talking about asynchronous delegates which ties another thread (execute a task in parallel with the caller).
However I've seen 2 main types of asynchronous methods examples :
Code samples (from books) who only uses existing I/O asynchronous operations as beginXXX / endXX e.g. Stream.BeginRead.
And I couldn't find any asynchronous methods samples which don't use existing .net I/O operations e.g. Stream.BeginRead )
Code samples like this (and this). which doesnt actually invoking an asynchronous operation (although the author thinks he is - but he actually causes a thread to block !)
Question :
Does asynchronous methods are used only with .net I/O existing methods like BeginXXX , EndXXX ?
I mean , If I want to create my own asynchronous methods like BeginMyDelay(int ms,...){..} , EndMyDelay(...). I couldn't done it without tie a blocked thread to it....correct?
Thank you very much.
p.s. please notice this question is tagged as .net 4 and not .net4.5

You're talking about APM.
APM widely uses OS concept, known as IO Completion ports. That's why different IO operations are the best candidates to use APM.
You could write your own APM methods.
But, in fact, these methods will be either over existing APM methods, or they will be IO-bound, and will use some native OS mechanism (like FilesStream, which uses overlapped file IO).
For compute-bound asynchronous operations APM only will increase complexity, IMO.
A bit more clarification.
Work with hardware is asynchronous by its nature. Hardware needs a time to perform request - newtork card must send or receive data, HDD must read/write etc. If IO is synchronous, thread, which was generated IO request, is waiting for response. And here APM helps - you shouldn't wait, just execute something else, and when IO will be complete, I'll call you, says APM.
The main point - operation is performing outside of CPU.
When you're writing any compute-bound operation, which will use CPU for it execution without any IO, there's nothing to wait here. So, APM coludn't help - if you need CPU, you need thread - you need thread pool.

I think, but I'm not sure, that you can create your own asynchronous methods. For example creating a new thread and wait for it to finish some work (db query, ...).
In term of overall system performance probably it is not useful, as you say you just create another thread. But for example if you work on IIS, the original request thread can be used for other requests while you are waiting for the 'background' operation.
I think that IIS has a fixed number of threads (thread pool), so in this case can be useful.

I mean , If I want to create my own asynchronous methods like
BeginMyDelay(int ms,...){..} , EndMyDelay(...). I couldn't done it
without tie a blocked thread to it....correct?
While I've not dug into the implementation of async, I can't see any reason why one couldn't do this.
The simplest way would be to use existing libraries that help [e.g. timers] or some sort of event system IIRC.
However even if you don't want to use any library helpers then you're stuck with a problem... the 'blocked thread'.
Sure the code does look something like this:
while (true){
foreach (var item in WaitingTasks)
if (item.Ready())
/*fire item, and remove it from tasks*/;
/*Some blocking action*/
}
Thing is - 'Some blocking action' doesn't have to be 'blocking'. You could yield/sleep the thread, or use it to process some data. For example, the Unity Game Engine does a similar thing with Coroutines - where the same thread that processes all the code also checks to see if various coroutines [that have been delayed due to time] need to be updated. Replace /*Some blocking action*/ with ProcessGameLoop().
Hoe that helps, feel free to ask questions/post corrections etc.

call to async method vs event

I am developping a client library for a network application protocol.
Client code calls the library to init it and to connect to server.
The client can of course send requests to the server, but the server can also send requests (Commands, called Cmd below) to the client.
The transport protocol is TCP/IP, so basically the client library connect to the server and make a call to an async method to retrieve the next request or response from the server in order to avoid I/O blocking while waiting for response/requests from the server.
That being said, I see two possible solutions (only using C# constructs and no specific third party framework) in the library to allow the client to receive requests from the server :
Either offer an event in the library such as
public EventHandler<ReceivedCmdEventArgs> event ReceivedCmd;
that the client would subscribe to, in order to get notidied of requests incoming from the server.
Of course for this mechanism I will have to make an async loop in the client library to receive requests from the server and raise the event on Cmd reception.
Or the other solution would be to make such a method in the client library
public async Task<Cmd> GetNextCmdAsync()
that the client code would call in an async loop to receive the cmds.
Are these solutions kind of the same ? Is it better to fully use async/await constrcuts of C#5 and not rely on events anymore ? What are the differences ? Any recommendation, remark ?
Thanks !

I think that the event-driven approach is better in your case.
In fact, you're talking about an observable/observer pattern. An unknown number of listeners/observers are waiting to do something if some command is received.
Async/await pattern wouldn't work as well as event-driven approach, because it'd be something like I expect one result in opposite to I'll do what you want whenever you report me that you received a command.
Conceptually talking, I prefer the event-driven approach because it fits better with the goal of your architecture.
Async/await pattern in C# 5 isn't designed for your case, but it's for when some code executes an async task and next code lines should be executed after the task has received a result.

Task represents a single asynchronous action, such as receiving a single command. As such, it is not directly suitable for streams of events.
The ultimate library for streams of events is Reactive Extensions (Rx), but it unfortunately has a rather steep learning curve.
A newer option is the lesser-known TPL Dataflow, which allows building up async-friendly dataflow meshes. In fact, I'm writing an async-friendly TCP/IP socket wrapper, and I'm exposing ISourceBlock<ArraySegment<byte>> as the reading stream. The end-user can then either receive from that block directly (ReceiveAsync), or they can just "link" it into a dataflow of their own (e.g., that can do message framing, parsing, and even handling).
Dataflow is slightly less efficient than Rx, but I think the lower learning curve is worth it.
I would not recommend a bare event - you either end up with a free-threaded event (think about how you would handle socket closure - could an event happen after disposal?) or the Event-based Asynchronous Pattern (which has its own similar problems syncing to the provided SynchronizationContext). Both Rx and Dataflow provide better solutions for synchronization and disposal/unsubscription.

Since you are making a library, events seem better suited.
Events allow you to build the library without enforcing that a call back must be specified.
Consumers of your library decide what they are interested in and listen to those events.
Async tasks on the other hand are meant where you know that there will be delays ( IO, Network, etc.) Async tasks allow you to free resources while these delays take place, thus resulting in better utilization of resources.
Async tasks are not a replacement for events that you raise.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.