In C#, when receiving network data with the BeginReceive/EndReceive methods, is there any reason you shouldn't process the packets as soon as you receive them? Some of the tasks can be decently cpu intensive. I ask because I've seen some implementations that push the packets off into a processing queue and then handle them there. To me this seems redundant because, as far as I know, the async methods also operate on a thread pool.
Generally, you need to receive 'enough' packets to have a data item that is 'processable'.
IMO, It's better to have one thread whose job is receiving data, and another to actually process it.
As Mitch points out, you need to be able to receive enough packets to have a complete message/frame . But there's no reason why you shouldn't start processing that frame immediately and issue another BeginReceive. In fact, if you believe your processing could take some time, you're better off handing it off to the worker thread-pool rather than block a thread from the i/o pool (which is where your callback will fire).
In addition, unless you're expecting a low number of connections, spawning a thread to handle each connection is not a very scalable approach, although it does have the benefit of some simplicity.
I recently wrote an article on pipelining data-processing off a network socket, which you can find here.
Related
I'm playing with SocketAsyncEventArgs and IO Completion Ports.
I've been looking but I can't seem to find how .NET handles race conditions.
Need clarification on this stack overflow question:
https://stackoverflow.com/a/28690948/855421
As a side note, don't forget that your request might have completed synchronously. Perhaps you're reading from a TCP stream in a while loop, 512 bytes at a time. If the socket buffer has enough data in it, multiple ReadAsyncs can return immediately without doing any thread switching at all. [emphasis mine]
For the sake of simplicity. Let's assume one client one server. The server is using a IOCP. If the client is a fast writer but server is a slow reader, does IOCP mean the kernel/underlying process can signal multiple threads?
1 So, socket reads 512 bytes, kernel signals a IOCP thread
2 Server processes new bytes
3 socket receives another X bytes but server is still processing previous buffer
Does the kernel spin up another thread? SocketAsyncEventArgs has a Buffer which by definition is: "Gets the data buffer to use with an asynchronous socket method." So the buffer should not change over the lifetime of the SocketAsyncEventArgs if I understand that correctly.
What's preventing SocketAsyncEventArgs.Buffer from getting corrupted by IOCP thread 2?
Or does the .NET framework synchronize IOCP threads? If so, what's the point of spinning up a new thread then if IOCP thread 1 blocks the previous read?
I've been looking but I can't seem to find how .NET handles race conditions.
For the most part, it doesn't. It's up to you to do that. But, it's not clear from your question that you really have a race condition problem.
You are asking about this text, in the other answer:
If the socket buffer has enough data in it, multiple ReadAsyncs can return immediately without doing any thread switching at all
First, to be clear: the method's name is ReceiveAsync(), not ReadAsync(). Other classes, like StreamReader and NetworkStream have ReadAsync() methods, and these methods have very little to do with what your question is about. Now, that clarified…
That quote is about the opposite of a race condition. The author of that text is warning you that, should you happen to call ReceiveAsync() on a socket that already has data ready to be read, the data will be read synchronously and the SocketAsyncEventArgs.Completed event will not be raised later. It will be the responsibility of the thread that called ReceiveAsync() to also process the data that was read.
All of this would happen in a single thread. There wouldn't be any race condition in that scenario.
Now, let's consider your "fast writer, slow reader" scenario. The worst that can happen there is that the first read, which could take place in any thread, does not complete immediately, but by the time the Completed event is raised, the writer has overrun the reader's pace. In this case, since part of handling the Completed event is likely to be calling ReceiveAsync() again, which now will return synchronously, an IOCP thread pool thread will get tied up looping on the calls to ReceiveAsync(). No new thread is needed, because the current IOCP thread is doing all the work synchronously. But it does prevent that thread from handling other IOCP events.
All that will mean though, is that if you have some other socket the server is handling and which also needs to call ReceiveAsync(), the framework will have to ensure there's another thread in the IOCP thread pool available to handle that I/O. But, that's a completely different socket and you would necessarily be using a completely different buffer for that socket anyway.
Again, no race condition.
Now, all that said, if you want to get really confused, it is possible to use asynchronous I/O in the .NET Socket API (whether with BeginReceive() or ReceiveAsync() or even wrapping the socket in a NetworkStream and using ReadAsync()) in such a way that you do have a race condition for a particular socket.
I hesitate to even mention it, because there's no evidence in your question this pertains to you at all, nor that you're even really interested in having this level of detail. Adding this explanation could just confuse things. But, for the sake of completeness…
It is possible to have issued more than one read operation on a socket at any given time. This would be somewhat akin to double- or triple-buffered video display (if you're familiar with that concept). The idea being that you might still be handling a read operation while new data comes in, and it would be more performant to have a new read operation already in progress to handle that data before you're done handling the current read operation.
This sounds great, but in practice because of the way Windows schedules threads, and in particular does not guarantee a particular ordering of thread scheduling, if you try to implement your code that way, you create the possibility that your code will see read operations completed out of order. That is, if you for example call ReceiveAsync() twice in a row (with two different SocketAsyncEventArgs objects and two different buffers, of course), your Completed event handler might get called with the second buffer first.
This isn't because the read operations themselves complete out of order. They don't. Hence the emphasis on "your" above. The problem is that while the IOCP threads handling the IO completions become runnable in the correct order (because the buffers are filled in the order you provided them by calling ReceiveAsync() multiple times), the second IOCP thread to become runnable could wind up being the first thread to actually be scheduled to run by Windows.
This is not hard to deal with. You just have to make sure that you track the buffer sequence as you issue the read operations, so that you can reassemble the buffers in the correct order later. All of the async options available provide a mechanism for you to include additional user state data (e.g. SocketAsyncEventArgs.UserToken), so you can use this to track the order of your buffers.
Again, this is not common. For most scenarios, a completely orderly implementation, where you only issue a new read operation after you're completely done with the current read operation, is completely sufficient. If you're worried at all about getting a multi-buffer read implementation correct, just don't bother. Stick with the simple approach.
I am still learning C# so please be easy on me. I am thinking about my application I am working on and I can't seem to figure out the best approach. This is not a forms application but rather a console. I am listening to a UDP port. I get UDP messages as fast as 10 times per second. I then look for a trigger in the UDP message. I am using an event handler that is raised each time i get a new UDP packet which will then call methods to parse the packet and look for my trigger. So, i have these questions.
With regard to threading, I assume a thread like my thread that listens to the UDP data should be a permanent thread?
Also on threading, when I get my trigger and decide to do something, in this case send a message out, i gather that I should use a thread pool each time I want to perform this task?
On thread pools, I am reading that they are not very high priority, is that true? If the message I need to send out is critical, can i rely on thread pools?
With the event handler which is raised when i get a UDP packet and then calls methods, what is the best way to ensure my methods all complete before the next packet/event is raised? At times I see event queue problems because if any of the methods take a bit longer than they should (for exampe writing to a DB) and the next packet comes in 100ms later, you get event queue growth because you cannot consume events in a timely manner. Is there a good way to address this?
With regard to threading, I assume a thread like my thread that listens to the UDP data should be a permanent thread?
There are no permanent threads. However there should be a thread that is responsible for receiving. Once you start it, let it run until you no longer need to receive any messages.
Also on threading, when I get my trigger and decide to do something, in this case send a message out, i gather that I should use a thread pool each time I want to perform this task?
That depends on how often would you send out messages. If your situation is more like consumer/producer than a separate thread for sending is a good idea. But if you send out a message only rarely, you can use thread pool. I can't define how often rare means in this case, you should watch your app and decide.
On thread pools, I am reading that they are not very high priority, is that true? If the message I need to send out is critical, can i rely on thread pools?
You can, it's more like your message will be delayed because of slow message processing or slow network rather than the thread pool.
With the event handler which is raised when i get a UDP packet and then calls methods, what is the best way to ensure my methods all complete before the next packet/event is raised? At times I see event queue problems because if any of the methods take a bit longer than they should (for exampe writing to a DB) and the next packet comes in 100ms later, you get event queue growth because you cannot consume events in a timely manner. Is there a good way to address this?
Queue is a perfect solution. You can have more queues if some messages are independent of others and their execution won't collide and then execute them in parallel.
I'll adress your points:
your listeting thread must be a 'permanent' thread that gets messages and distribute them.
(2+3) - Look at the TPL libarary you should use it instead of working with threads and thread pools (unless you need some fine control over the operations which, from your question, seems like you dont need) - as MSDN states:
The Task Parallel Library (TPL) is based on the concept of a task, which represents an asynchronous operation. In some ways, a task resembles a thread or ThreadPool work item, but at a higher level of abstraction
Look into using MessageQueues since what you need is a place to receive messages, store them for some time (in memory in your case)and handle them at your own pace.
You could implement this yourself but you'll find it gets complicated quickly,
I recommend looking into NetMQ - it's easy to use, especially for what you describe, and it's in c#.
I am implementing a TCP client in my Unity3D game and I am wondering if it's actually safe or not to call the NetworkStream.BeginWrite without waiting until the previous call finishes writing.
From what I understood while reading the documentation, it's safe until I am not performing concurrent BeginWrite calls in the different threads (and Unity has only one thread for the game main loop).
For my reading I call BeginRead right after making a connection with the asynchronous callback in which I read the incoming data from the TcpClient.GetStream(), put it to the separate MemoryStream with lock(readMemoryStream), and run BeginRead again. Besides that, in my Update() function (in the main game thread) I check for the new data in the readMemoryStream, check for the solid message and unpack it (using the same lock(readMemoryStream) of course) and perform operations on the game objects based on the message from server.
Will this approach work fine? Won't BeginRead interfere with BeginWrite?
Again, I am using callback thread to read the data and main thread to write.
As long as no two threads are calling BeginWrite() concurrently, all is well. The same thread, or even other threads, can call BeginWrite() consecutively before earlier calls have completed.
Do note that the completion callbacks might be executed out of order; if you do implement it this way and the order of the execution of the completion callbacks matters, it is up to you to keep track of which asynchronous operation is which. Of course, for writing to the socket, this often doesn't matter, as you may not have anything to do in the completion callback other than to call EndWrite().
Reading from and writing to a socket are completely independent operations. The socket is full-duplex and can safely handle concurrently pending read and write operations on the same socket.
You didn't ask, but like BeginWrite(), you can also call BeginRead() multiple times without earlier operations completing. And again, as with BeginWrite(), it's up to you to keep track of the correct order of the operations so that when your completion callback is executed for each one, you know which order the received data should be in.
Note that since the order of the completions is critical for read operations (something often not the case for write operations), it is common for all but the largest-scale implementations to never overlap read operations on a given socket. The code is much simpler when for a given socket, only one read operation is in progress at a time.
One last caveat: do note that your buffers are pinned for the duration of the I/O operation. Too many outstanding I/O operations can interfere with the efficient management of the heap, due to fragmentation. This is unlikely to be an issue in a client implementation, but a large-scale server implementation should take this into account (e.g. by allocating large buffers so that they come from the LOH, where things are always pinned anyway).
Everything that I read about sockets in .NET says that the asynchronous pattern gives better performance (especially with the new SocketAsyncEventArgs which saves on the allocation).
I think this makes sense if we're talking about a server with many client connections where its not possible to allocate one thread per connection. Then I can see the advantage of using the ThreadPool threads and getting async callbacks on them.
But in my app, I'm the client and I just need to listen to one server sending market tick data over one tcp connection. Right now, I create a single thread, set the priority to Highest, and call Socket.Receive() with it. My thread blocks on this call and wakes up once new data arrives.
If I were to switch this to an async pattern so that I get a callback when there's new data, I see two issues
The threadpool threads will have default priority so it seems they will be strictly worse than my own thread which has Highest priority.
I'll still have to send everything through a single thread at some point. Say that I get N callbacks at almost the same time on N different threadpool threads notifying me that there's new data. The N byte arrays that they deliver can't be processed on the threadpool threads because there's no guarantee that they represent N unique market data messages because TCP is stream based. I'll have to lock and put the bytes into an array anyway and signal some other thread that can process what's in the array. So I'm not sure what having N threadpool threads is buying me.
Am I thinking about this wrong? Is there a reason to use the Async patter in my specific case of one client connected to one server?
UPDATE:
So I think that I was mis-understanding the async pattern in (2) above. I would get a callback on one worker thread when there was data available. Then I would begin another async receive and get another callback, etc. I wouldn't get N callbacks at the same time.
The question still is the same though. Is there any reason that the callbacks would be better in my specific situation where I'm the client and only connected to one server.
The slowest part of your application will be the network communication. It's highly likely that you will make almost no difference to performance for a one thread, one connection client by tweaking things like this. The network communication itself will dwarf all other contributions to processing or context switching time.
Say that I get N callbacks at almost
the same time on N different
threadpool threads notifying me that
there's new data.
Why is that going to happen? If you have one socket, you Begin an operation on it to receive data, and you get exactly one callback when it's done. You then decide whether to do another operation. It sounds like you're overcomplicating it, though maybe I'm oversimplifying it with regard to what you're trying to do.
In summary, I'd say: pick the simplest programming model that gets you what you want; considering choices available in your scenario, they would be unlikely to make any noticeable difference to performance whichever one you go with. With the blocking model, you're "wasting" a thread that could be doing some real work, but hey... maybe you don't have any real work for it to do.
The number one rule of performance is only try to improve it when you have to.
I see you mention standards but never mention problems, if you are not having any, then you don't need to worry what the standards say.
"This class was specifically designed for network server applications that require high performance."
As I understand, you are a client here, having only a single connection.
Data on this connection arrives in order, consumed by a single thread.
You will probably loose performance if you instead receive small amounts on separate threads, just so that you can assemble them later in a serialized - and thus like single-threaded - manner.
Much Ado about Nothing.
You do not really need to speed this up, you probably cannot.
What you can do, however is to dispatch work units to other threads after you receive them.
You do not need SocketAsyncEventArgs for this. This might speed things up.
As always, measure & measure.
Also, just because you can, it does not mean you should.
If the performance is enough for the foreseeable future, why complicate matters?
I'm just trying to make some socket programming, using non-blocking sockets in c#.
The various samples that i've found, such as this, seems to use a while(true) loop, but this approach causes the cpu to burst at 100%.
Is there a way to use non-blocking sockets using a event programming style?
Thanks
See the MSDN example here. The example shows how to receive data asynchronously. You can also use the Socket BeginSend/EndSend methods to send data asynchronously.
You should note that the callback delegate executes in the context of a ThreadPool thread. This is important if the data received inside the callback needs to be shared with another thread, e.g., the main UI thread that displays the data in a Windows form. If so, you will need to synchronized access to the data using the lock keyword, for example.
As you've noticed, with nonblocking sockets and a while loop, the processor is pegged at 100%. The asynchronous model will only invoke the callback delegate when there is data to send or receive.
Talking generally about blocking/non-blocking IO, applicable generally:
The key thing is that in real life your program does other things whilst not doing IO. The examples are all contrived in this way.
In blocking IO, your thread 'blocks' while waiting for IO. The OS goes and does other things, e.g. allows other threads to run. So your application can do many things (conceptually) in parallel by using many threads.
In non-blocking IO, your thread queries to see if IO is possible, and otherwise goes and does something else. So you do many things in parallel by explicitly - at an application level - swapping between them.
To avoid a CPU issue in heavy while loop, when no data receive put thread.sleep(100) or less. That will let other processes change to do their task
Socket.BeginReceive and AsyncCallback