safely send observable elements over awaitable tcp connection - c#

I have an observable which wraps a data source, which it continually watches and spits out changes as they occur:
IObservable<String> ReadDatasource()
{
//returns data from the data source, and watches it for changes
}
and within my code I have a TCP connection
interface Connection
{
Task Send(String data);
Boolean IsAvailable { get; }
}
which subscribes to the observable:
Connection _connection;
ReadDatabase()
.SubscribeOn(NewThreadScheduler.Default)
.Subscribe(
onNext: async r =>
{
if (_connection.IsAvailable)
{
try
{
await _connection.Send(r);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
}
If the connection is closed by the client when the observable is spitting out large volumes of data in quick succession, there are a tonne of built up tasks still awaiting (at least I think this is the case), which then throw a tonne of exceptions due to the connection not being available (the _connection.IsAvailable has already been checked). FWIW I do not have the ability to make changes inside the _connection.Send(data) method. I have no issue waiting for the _connection.Send(data) to complete before moving onto the next element in the observable sequence. I fact, that would probably be preferable.
Is there a simple Rx style of handling this case?

there are a tonne of built up tasks still awaiting... which then throw a tonne of exceptions due to the connection not being available
Yes, that's what I would expect with this code. And there's nothing really wrong with that, since each one of those sends are in fact failing to send. If your code works fine with this, then you might just want to keep it as-is.
Otherwise...
(the _connection.IsAvailable has already been checked).
Yes. Connection.IsAvailable is useless. So are Socket.Connected / TcpClient.Connected, for that matter. They're all useless because all they tell you is whether an error has already occurred. Which you, er, already know because the last call already threw an exception. They do not provide any guarantee or even a guess as to whether the next method will succeed. This is why you need other mechanisms to detect socket connection failure.
I have no issue waiting for the _connection.Send(data) to complete before moving onto the next element in the observable sequence. I fact, that would probably be preferable.
If Connection is a simple wrapper around a socket without a write queue, then you should definitely only perform one call to Send at a time. This is due to the fact that in resource-constrained scenarios (i.e., always in production, not on your dev box), a "write" operation for a socket may only write some of the bytes to the actual network stream. I assume your Connection wrapper is handling partial writes by continuing to write until the entire data buffer is sent. This works great unless the code calls Send multiple times - in which case you can end up with bytes being out of order (A and then B are written; A partially completes and the wrapper sends the rest of A in another write... after B).
So, you'll need a write queue for reliable operation. If Connection already provides one, then I'd say you don't need to do anything else; the multiple Sends failing are normal. But if Connection only handles sending that single data buffer and does not queue up its write requests, then you'll need to do that yourself.
This is most easily accomplished by using a TPL Dataflow block. Specifically, ActionBlock<T>:
// Define the source observable.
var obs = ReadDatabase().SubscribeOn(NewThreadScheduler.Default);
// Create our queue which calls Send for each observable item.
var queue = new ActionBlock<string>(data => _connection.Send(data));
try
{
// Subscribe the queue to the observable and (asynchronously) wait for it to complete.
using (var subscription = obs.Subscribe(queue.AsObserver()))
await queue.Completion;
}
catch (Exception ex)
{
// The first exception thrown from Send will end up here.
Console.WriteLine(ex.Message);
}
Dataflow blocks understand asynchronous code, and by default they only process one item at a time. So, this code will invoke Send one at a time, buffering up additional data items in a FIFO queue until that Send completes.
Dataflow blocks have a "fail fast" behavior, so the first Send that throws will fault the block, causing it to discard all remaining queued writes. When the block faults, await queue.Completion will throw, unsubscribing from the observable and displaying the message.
If the observable completes, then await queue.Completion will complete, again unsubscribing from the observable, and continue execution normally.
For more about interfacing Rx with TPL Dataflow, see my Concurrency in C# Cookbook, recipe 7.7. You may also find this Stack Overflow answer helpful in understanding why passing an async lambda to Subscribe isn't ideal.

Related

IO operations with Sockets

I want to receive constantly while a socket is alive and call concurrently to a send method from different threads (I'm sorry for my English).
I have thought about the following:
As there will be multiple clients I think it would be better if the operating system is responsible for generating the threads so as not to compromise the performance, therefore, I do not want to check the Receive method in a loop on a different thread for each client. This I discarded instantly.
To avoid the creation of instances of the type that implements the IAsyncResult interface used in each call to BeginReceive and thus not stress the GB and have a better performance when there is a heavy load of IO operations I decided not to use BeginReceive / EndReceive.
Instead, I've thought about using the ReceiveAsync method and a reusable SocketAsyncEventArgs instance for each connection. For example, if the server supports 1000 connections, it will have that same number of instances of SocketAsyncEventArgs, one for each client during the time the connection is alive. When the client disconnects the underlying SocketAsyncEventArgs instance will return to the pool for later use in another connection (SOLUTION CHOSEN)
Regarding the sending operation, I do not care that the send requests are not processed in the order of call; the important thing is that the bytes do not mix with those of other messages, so that in the buffer of reception in the remote host the messages arrive one after the other so that the class that is in charge of the protocol can interpret them.
For this, I do not want to call BeginSend / EndSend because apart from creating different IAsynResult instances in each call, I understand that there is a possibility that message bytes are mixed when calling from different threads, is that correct? I do not remember where I read it a long time ago. I would like to have reliable sources. Regardless of whether it is true or not, I do not intend to use it.
I also understand that SendAsync is like a mask for BeginSend / EndSend, with the difference that it reuses the underlying IAsyncResult instance. Which makes me think that although I can concurrently call the SendAsyn method as well as BeginSend, there is also the possibility that the bytes are mixed in the output buffer.
To process multiple calls with SendAsyn you should also provide a different instance of SocketAsyncEventArgs on each call, which I do not like. Then I thought about using the Semaphore class to guarantee a sending operation at a time and once completed, reuse the SocketAsyncEventArgs instance used in the previous send operation. This way I would only occupy two instances of SocketAsyncEventArgs per connection (one to receive and one to send) and I avoided having a pool of these objects and not only that, it would also prevent the bytes of different messages from being mixed in the output buffer.
Regarding the solution that I found to receive constantly I am satisfied. But for the shipping operation I'm not sure. I do not find any advantage when using SendAsync. I only want multiple threads can call concurrently to the shipping method. Then I thought about using the Send method and wrapping it in an asynchronous method like the one shown below:
public virtual async Task<int> SendAsync(byte[] buffer, int offset, int size, CancellationToken cancellationToken)
{
ThrowIfDisposed();
var sendingRequest = CancellationTokenSource.CreateLinkedTokenSource(requests.Token, cancellationToken);
int bytesSent = 0;
try
{
await semaphore.WaitAsync(sendingRequest.Token).ConfigureAwait();
while(bytesSent < size)
{
int bytesWrote = clientSocket.Send(buffer, offset, size, SocketFlags.None);
if(bytesWrote == 0)
{
throw new SocketException((int)SocketError.NotConnected);
}
else
{
bytesSend += bytesWrote;
}
}
}
catch(SocketException)
{
Disconnect();
throw;
}
finally
{
semaphore.Release();
sendingRequest.Dispose();
sendingRequest = null;
}
return bytesSent;
}
I would appreciate it if you would tell me in which parts of everything I said above I am wrong or if my approach is correct or how to improve what I already have.
Thanks.

Massive time delay after socket disconnect in Async/Await TCP server

I've been upgrading some older software from the Begin/End pattern in C# to use the new async functionality of the TcpClient class.
Long story short, this receive method works great for small numbers of connected sockets, and continues to work great for 10,000+ connections. The problem comes when these sockets disconnect.
The method I am using server side is, in essence, this (heavily simplified but still causes the problem):
private async void ReceiveDataUntilStopped(object state)
{
while (IsConnected)
{
try
{
byte[] data = new byte[8192];
int recvCount = await _stream.ReadAsync(data, 0, data.Length);
if (recvCount == 0) { throw new Exception(); }
Array.Resize(ref data, recvCount);
Console.WriteLine(">>{0}<<", Encoding.UTF8.GetString(data));
}
catch { Shutdown(); return; }
}
}
This method is called using ThreadPool.QueueUserWorkItem(ReceiveDataUntilStopped); when the connection is accepted.
To test the server, I connect 1,000 sockets. The time it takes to accept these is neglible, around 2 seconds or so. I'm very pleased with this. However, when I disconnect these 1,000 sockets, the process takes a substantial amount of time, 15 or more seconds, to handle the closure of these sockets (the Shutdown method). During this time, my server refuses any more connections. I emptied the contents of the Shutdown method to see if there was something in there blocking, but the delay remains the same.
Am I being stupid and doing something I shouldn't? I'm relatively new to the async/await pattern, but enjoying it so far.
Is this unavoidable behaviour? I understand it's unlikely in production that 1,000 sockets will disconnect at the same time, but I'd like to be able to handle a scenario like this without causing a denial of service. It strikes me as odd that the listener stops accepting new sockets, but I expect this is because all the ThreadPool threads are busy shutting down the disconnected sockets?
EDIT: While I agree that throwing an exception when 0 bytes are received is not good control flow, this is not the source of the problem. The problem is still present with simply if (recvCount == 0) { Shutdown(); return; }. This is because ReadAsync throws an IOException if the other side disconnects uncleanly. I'm also aware that I'm not handling the buffers properly etc. this is just an example, with minimal content, just like SO likes. I use the following code to accept clients:
private async void AcceptClientsUntilStopped()
{
while (IsListening)
{
try
{
ServerConnection newConnection = new ServerConnection(await _listener.AcceptTcpClientAsync());
lock (_connections) { _connections.Add(newConnection); }
Console.WriteLine(_connections.Count);
}
catch { Stop(); }
}
}
if (recvCount == 0) { throw new Exception(); }
In case of disconnect you throw an exception. Exceptions are very expensive. I benchmarked them once at 10000/sec. This is very slow.
Under the debugger, exceptions are vastly slower again (maybe 100x).
This is a misuse of exceptions for control flow. From a code quality standpoint this is really bad. Your exception handling also is really bad because it catches too much. You meant to catch socket problems but you're also swallowing all possible bugs such as NRE.
using (mySocket) { //whatever you are using, maybe a TcpClient
while (true)
{
byte[] data = new byte[8192];
int recvCount = await _stream.ReadAsync(data, 0, data.Length);
if (recvCount == 0) break;
Array.Resize(ref data, recvCount);
Console.WriteLine(">>{0}<<", Encoding.UTF8.GetString(data));
}
Shutdown();
}
Much better, wow.
Further issues: Inefficient buffer handling, broken UTF8 decoding (can't split UTF8 at any byte position!), usage of async void (probably, you should use Task.Run to initiate this method, or simply call it and discard the result task).
In the comments we discovered that the following works:
Start a high-prio thread and accept synchronously on that (no await). That should keep the accepting going. Fixing the exceptions is not going to be 100% possible, but: await increases the cost of exceptions because it rethrows them. It uses ExceptionDispatchInfo for that which holds a process-global lock while doing that. Might be part of your scalability problems. You could improve perf by doing await readTask.ContinueWith(_ => { }). That way await will never throw.
Based on the code provided and my initial understanding of the problem. I think that there are several things that you should do to address this issue.
Use async Task instead of async void. This will ensure that the async state machine knows how to actually maintain its state.
Instead of invoking ThreadPool.QueueUserWorkItem(ReceiveDataUntilStopped); call ReceiveDataUntilStopped via await ReceiveDataUntilStopped in the context of an async Task method.
With async await, the Task and Task<T> objects represent the asynchronous operation. If you are concerned that the results of the await are executed on the original calling thread you could use .ConfigureAwait(false) to prevent capturing the current synchronization context. This is explained very well here and here too.
Additionally, look at how a similar "read-while" was written with this example.

Nested lock in Task.ContinueWith - Safe, or playing with fire?

Windows service: Generating a set of FileWatcher objects from a list of directories to watch in a config file, have the following requirements:
File processing can be time consuming - events must be handled on their own task threads
Keep handles to the event handler tasks to wait for completion in an OnStop() event.
Track the hashes of uploaded files; don't reprocess if not different
Persist the file hashes to allow OnStart() to process files uploaded while the service was down.
Never process a file more than once.
(Regarding #3, we do get events when there are no changes... most notably because of the duplicate-event issue with FileWatchers)
To do these things, I have two dictionaries - one for the files uploaded, and one for the tasks themselves. Both objects are static, and I need to lock them when adding/removing/updating files and tasks. Simplified code:
public sealed class TrackingFileSystemWatcher : FileSystemWatcher {
private static readonly object fileWatcherDictionaryLock = new object();
private static readonly object runningTaskDictionaryLock = new object();
private readonly Dictionary<int, Task> runningTaskDictionary = new Dictionary<int, Task>(15);
private readonly Dictionary<string, FileSystemWatcherProperties> fileWatcherDictionary = new Dictionary<string, FileSystemWatcherProperties>();
// Wired up elsewhere
private void OnChanged(object sender, FileSystemEventArgs eventArgs) {
this.ProcessModifiedDatafeed(eventArgs);
}
private void ProcessModifiedDatafeed(FileSystemEventArgs eventArgs) {
lock (TrackingFileSystemWatcher.fileWatcherDictionaryLock) {
// Read the file and generate hash here
// Properties if the file has been processed before
// ContainsNonNullKey is an extension method
if (this.fileWatcherDictionary.ContainsNonNullKey(eventArgs.FullPath)) {
try {
fileProperties = this.fileWatcherDictionary[eventArgs.FullPath];
}
catch (KeyNotFoundException keyNotFoundException) {}
catch (ArgumentNullException argumentNullException) {}
}
else {
// Create a new properties object
}
fileProperties.ChangeType = eventArgs.ChangeType;
fileProperties.FileContentsHash = md5Hash;
fileProperties.LastEventTimestamp = DateTime.Now;
Task task;
try {
task = new Task(() => new DatafeedUploadHandler().UploadDatafeed(this.legalOrg, datafeedFileData), TaskCreationOptions.LongRunning);
}
catch {
..
}
// Only lock long enough to add the task to the dictionary
lock (TrackingFileSystemWatcher.runningTaskDictionaryLock) {
try {
this.runningTaskDictionary.Add(task.Id, task);
}
catch {
..
}
}
try {
task.ContinueWith(t => {
try {
lock (TrackingFileSystemWatcher.runningTaskDictionaryLock) {
this.runningTaskDictionary.Remove(t.Id);
}
// Will this lock burn me?
lock (TrackingFileSystemWatcher.fileWatcherDictionaryLock) {
// Persist the file watcher properties to
// disk for recovery at OnStart()
}
}
catch {
..
}
});
task.Start();
}
catch {
..
}
}
}
}
What's the effect of requesting a lock on the FileSystemWatcher collection in the ContinueWith() delegate when the delegate is defined within a lock on the same object? I would expect it to be fine, that even if the task starts, completes, and enters the ContinueWith() before ProcessModifiedDatafeed() releases the lock, the task thread would simply be suspended until the creating thread has released the lock. But I want to make sure I'm not stepping on any delayed execution landmines.
Looking at the code, I may be able to release the lock sooner, avoiding the issue, but I'm not certain yet... need to review the full code to be sure.
UPDATE
To stem the rising "this code is terrible" comments, there are very good reasons why I catch the exceptions I do, and am catching so many of them. This is a Windows service with multi-threaded handlers, and it may not crash. Ever. Which it will do if any of those threads have an unhandled exception.
Also, those exceptions are written to future bulletproofing. The example I've given in comments below would be adding a factory for the handlers... as the code is written today, there will never be a null task, but if the factory is not implemented correctly, the code could throw an exception. Yes, that should be caught in testing. However, I have junior developers on my team... "May. Not. Crash." (also, it must shut down gracefully if there is an unhandled exception, allowing currently-running threads to complete - which we do with an unhandled exception handler set in main()). We have enterprise-level monitors configured to send alerts when application errors appear on the event log – those exceptions will log and flag us. The approach was a deliberate and discussed decision.
Each possible exception has each been carefully considered and chosen to fall into one of two categories - those that apply to a single datafeed and will not shut down the service (the majority), and those that indicate clear programming or other errors that fundamentally render the code useless for all datafeeds. For example, we've chosen to shut down the service down if we can't write to the event log, as that's our primary mechanism for indicating datafeeds are not getting processed. The exceptions are caught locally, because the local context is the only place where the decision to continue can be made. Furthermore, allowing exceptions to bubble up to higher levels (1) violates the concept of abstraction, and (2) makes no sense in a worker thread.
I'm surprised at the number of people who argue against handling exceptions. If I had a dime for every try..catch(Exception){do nothing} I see, you'd get your change in nickels for the rest of eternity. I would argue to the death1 that if a call into the .NET framework or your own code throws an exception, you need to consider the scenario that would cause that exception to occur and explicitly decide how it should be handled. My code catches UnauthorizedExceptions in IO operations, because when I considered how that could happen, I realized that adding a new datafeed directory requires permissions to be granted to the service account (it won't have them by default).
I appreciate the constructive input... just please don't criticize simplified example code with a broad "this sucks" brush. The code does not suck - it is bulletproof, and necessarily so.
1 I would only argue a really long time if Jon Skeet disagrees
First, your question: it's not a problem in itself to request lock inside ContinueWith. If you bother you do that inside another lock block - just don't. Your continuation will execute asynchronously, in different time, different thread.
Now, code itself is questionable. Why do you use many try-catch blocks around statements that almost cannot throw exceptions? For example here:
try {
task = new Task(() => new DatafeedUploadHandler().UploadDatafeed(this.legalOrg, datafeedFileData), TaskCreationOptions.LongRunning);
}
catch {}
You just create task - I cannot imagine when this can throw. Same story with ContinueWith. Here:
this.runningTaskDictionary.Add(task.Id, task);
you can just check if such key already exists. But even that is not necessary because task.Id is unique id for given task instance which you just created. This:
try {
fileProperties = this.fileWatcherDictionary[eventArgs.FullPath];
}
catch (KeyNotFoundException keyNotFoundException) {}
catch (ArgumentNullException argumentNullException) {}
is even worse. You should not use exceptions lile this - don't catch KeyNotFoundException but use appropriate methods on Dictionary (like TryGetValue).
So to start with, remove all try catch blocks and either use one for the whole method, or use them on statements that can really throw exceptions and you cannot handle that situation otherwise (and you know what to do with exception thrown).
Then, your approach to handle filesystem events is not quite scaleable and reliable. Many programs will generate multiple change events in short intervals when they are saving changes to a file (there are also other cases of multiple events for the same file going in sequence). If you just start processing file on every event, this might lead to different kind of troubles. So you might need to throttle events coming for a given file and only start processing after certain delay after last detected change. That might be a bit advanced stuff, though.
Don't forget to grab a read lock on the file as soon as possible, so that other processes cannot change file while you are working with it (for example, you might calculate md5 of a file, then someone changes file, then you start uploading - now your md5 is invalid). Other approach is to record last write time and when it comes to uploading - grab read lock and check if file was not changed in between.
What is more important is that there can be a lot of changes at once. Say I copied 1000 files very fast - you do not want to start uploading them all at once with 1000 threads. You need a queue of files to process, and take items from that queue with several threads. This way thousands of events might happen at once and your upload will still work reliably. Right now you create new thread for each change event, where you immediatly start upload (according to method names) - this will fail under serious load of events (and in cases described above).
No it will not burn you. Even if the ContinueWith is inlined into to the current thread that was running the new Task(() => new DatafeedUploadHandler().. it will get the lock e.g. no dead lock.
The lock statement is using the Monitor class internally, and it is reentrant. e.g. a thread can aquire a lock multiple times if it already got/owns the lock. Multithreading and Locking (Thread-Safe operations)
And the other case where the task.ContinueWith starts before the ProcessModifiedDatafeed finished is like you said. The thread that is running the ContinueWith simply would have to wait to get the lock.
I would really consider to do the task.ContinueWith and the task.Start() outside of the lock if you reviewed it. And it is possible based on your posted code.
You should also take a look at the ConcurrentDictionary in the System.Collections.Concurrent namespace. It would make the code easier and you dont have to manage the locking yourself. You are doing some kind of compare exchange/update here if (this.fileWatcherDictionary.ContainsNonNullKey(eventArgs.FullPath)). e.g. only add if not already in the dictionary. This is one atomic operation. There is no function to do this with a ConcurrentDictionary but there is an AddOrUpdate method. Maybe you can rewrite it by using this method. And based on your code you could safely use the ConcurrentDictionary at least for the runningTaskDictionary
Oh and TaskCreationOptions.LongRunning is literally creating a new thread for every task which is kind of an expensive operation. The windows internal thread pool is intelligent in new windows versions and is adapting dynamically. It will "see" that you are doing lots of IO stuff and will spawn new threads as needed and practical.
Greetings
I have not fully followed the logic of this code but are you aware that task continuations and calls to Wait/Result can be inlined onto the current thread? This can cause reentrancy.
This is very dangerous and has burned many.
Also I don't quite see why you are starting task delayed. This is a code smell. Also why are you wrapping the task creation with try? This can never throw.
This clearly is a partial answer. But the code looks very tangled to me. If it's this hard to audit it you probably should write it differently in the first place.

Preventing task from running on certain thread

I have been struggling a bit with some async await stuff. I am using RabbitMQ for sending/receiving messages between some programs.
As a bit of background, the RabbitMQ client uses 3 or so threads that I can see: A connection thread and two heartbeat threads. Whenever a message is received via TCP, the connection thread handles it and calls a callback which I have supplied via an interface. The documentation says that it is best to avoid doing lots of work during this call since its done on the same thread as the connection and things need to continue on. They supply a QueueingBasicConsumer which has a blocking 'Dequeue' method which is used to wait for a message to be received.
I wanted my consumers to be able to actually release their thread context during this waiting time so somebody else could do some work, so I decided to use async/await tasks. I wrote an AwaitableBasicConsumer class which uses TaskCompletionSources in the following fashion:
I have an awaitable Dequeue method:
public Task<RabbitMQ.Client.Events.BasicDeliverEventArgs> DequeueAsync(CancellationToken cancellationToken)
{
//we are enqueueing a TCS. This is a "read"
rwLock.EnterReadLock();
try
{
TaskCompletionSource<RabbitMQ.Client.Events.BasicDeliverEventArgs> tcs = new TaskCompletionSource<RabbitMQ.Client.Events.BasicDeliverEventArgs>();
//if we are cancelled before we finish, this will cause the tcs to become cancelled
cancellationToken.Register(() =>
{
tcs.TrySetCanceled();
});
//if there is something in the undelivered queue, the task will be immediately completed
//otherwise, we queue the task into deliveryTCS
if (!TryDeliverUndelivered(tcs))
deliveryTCS.Enqueue(tcs);
}
return tcs.Task;
}
finally
{
rwLock.ExitReadLock();
}
}
The callback which the rabbitmq client calls fulfills the tasks: This is called from the context of the AMQP Connection thread
public void HandleBasicDeliver(string consumerTag, ulong deliveryTag, bool redelivered, string exchange, string routingKey, RabbitMQ.Client.IBasicProperties properties, byte[] body)
{
//we want nothing added while we remove. We also block until everybody is done.
rwLock.EnterWriteLock();
try
{
RabbitMQ.Client.Events.BasicDeliverEventArgs e = new RabbitMQ.Client.Events.BasicDeliverEventArgs(consumerTag, deliveryTag, redelivered, exchange, routingKey, properties, body);
bool sent = false;
TaskCompletionSource<RabbitMQ.Client.Events.BasicDeliverEventArgs> tcs;
while (deliveryTCS.TryDequeue(out tcs))
{
//once we manage to actually set somebody's result, we are done with handling this
if (tcs.TrySetResult(e))
{
sent = true;
break;
}
}
//if nothing was sent, we queue up what we got so that somebody can get it later.
/**
* Without the rwlock, this logic would cause concurrency problems in the case where after the while block completes without sending, somebody enqueues themselves. They would get the
* next message and the person who enqueues after them would get the message received now. Locking prevents that from happening since nobody can add to the queue while we are
* doing our thing here.
*/
if (!sent)
{
undelivered.Enqueue(e);
}
}
finally
{
rwLock.ExitWriteLock();
}
}
rwLock is a ReaderWriterLockSlim. The two queues (deliveryTCS and undelivered) are ConcurrentQueues.
The problem:
Every once in a while, the method that awaits the dequeue method throws an exception. This would not normally be an issue since that method is also async and so it enters the "Exception" completion state that tasks enter. The problem comes in the situation where the task that calls DequeueAsync is resumed after the await on the AMQP Connection thread that the RabbitMQ client creates. Normally I have seen tasks resume onto the main thread or one of the worker threads floating around. However, when it resumes onto the AMQP thread and an exception is thrown, everything stalls. The task does not enter its "Exception state" and the AMQP Connection thread is left saying that it is executing the method that had the exception occur.
My main confusion here is why this doesn't work:
var task = c.RunAsync(); //<-- This method awaits the DequeueAsync and throws an exception afterwards
ConsumerTaskState state = new ConsumerTaskState()
{
Connection = connection,
CancellationToken = cancellationToken
};
//if there is a problem, we execute our faulted method
//PROBLEM: If task fails when its resumed onto the AMQP thread, this method is never called
task.ContinueWith(this.OnFaulted, state, TaskContinuationOptions.OnlyOnFaulted);
Here is the RunAsync method, set up for the test:
public async Task RunAsync()
{
using (var channel = this.Connection.CreateModel())
{
...
AwaitableBasicConsumer consumer = new AwaitableBasicConsumer(channel);
var result = consumer.DequeueAsync(this.CancellationToken);
//wait until we find something to eat
await result;
throw new NotImplementeException(); //<-- the test exception. Normally this causes OnFaulted to be called, but sometimes, it stalls
...
} //<-- This is where the debugger says the thread is sitting at when I find it in the stalled state
}
Reading what I have written, I see that I may not have explained my problem very well. If clarification is needed, just ask.
My solutions that I have come up with are as follows:
Remove all Async/Await code and just use straight up threads and block. Performance will be decreased, but at least it won't stall sometimes
Somehow exempt the AMQP threads from being used for resuming tasks. I assume that they were sleeping or something and then the default TaskScheduler decided to use them. If I could find a way to tell the task scheduler that those threads are off limits, that would be great.
Does anyone have an explanation for why this is happening or any suggestions to solving this? Right now I am removing the async code just so that the program is reliable, but I really want to understand what is going on here.
I first recommend that you read my async intro, which explains in precise terms how await will capture a context and use that to resume execution. In short, it will capture the current SynchronizationContext (or the current TaskScheduler if SynchronizationContext.Current is null).
The other important detail is that async continuations are scheduled with TaskContinuationOptions.ExecuteSynchronously (as #svick pointed out in a comment). I have a blog post about this but AFAIK it is not officially documented anywhere. This detail does make writing an async producer/consumer queue difficult.
The reason await isn't "switching back to the original context" is (probably) because the RabbitMQ threads don't have a SynchronizationContext or TaskScheduler - thus, the continuation is executed directly when you call TrySetResult because those threads look just like regular thread pool threads.
BTW, reading through your code, I suspect your use of a reader/writer lock and concurrent queues are incorrect. I can't be sure without seeing the whole code, but that's my impression.
I strongly recommend you use an existing async queue and build a consumer around that (in other words, let someone else do the hard part :). The BufferBlock<T> type in TPL Dataflow can act as an async queue; that would be my first recommendation if you have Dataflow available on your platform. Otherwise, I have an AsyncProducerConsumerQueue type in my AsyncEx library, or you could write your own (as I describe on my blog).
Here's an example using BufferBlock<T>:
private readonly BufferBlock<RabbitMQ.Client.Events.BasicDeliverEventArgs> _queue = new BufferBlock<RabbitMQ.Client.Events.BasicDeliverEventArgs>();
public void HandleBasicDeliver(string consumerTag, ulong deliveryTag, bool redelivered, string exchange, string routingKey, RabbitMQ.Client.IBasicProperties properties, byte[] body)
{
RabbitMQ.Client.Events.BasicDeliverEventArgs e = new RabbitMQ.Client.Events.BasicDeliverEventArgs(consumerTag, deliveryTag, redelivered, exchange, routingKey, properties, body);
_queue.Post(e);
}
public Task<RabbitMQ.Client.Events.BasicDeliverEventArgs> DequeueAsync(CancellationToken cancellationToken)
{
return _queue.ReceiveAsync(cancellationToken);
}
In this example, I'm keeping your DequeueAsync API. However, once you start using TPL Dataflow, consider using it elsewhere as well. When you need a queue like this, it's common to find other parts of your code that would also benefit from a dataflow approach. E.g., instead of having a bunch of methods calling DequeueAsync, you could link your BufferBlock to an ActionBlock.

Server design using SocketAsyncEventArgs

I want to create an asynchronous socket Server using the SocketAsyncEventArgs event.
The server should manage about 1000 connections at the same time. What is the best way to handle the logic for each packet?
The Server design is based on this MSDN example, so every socket will have his own SocketAsyncEventArgs for receiving data.
Do the logic stuff inside the receive function.
No overhead will be created, but since the next ReceiveAsync() call won’t be done before the logic has completed, new data can’t be read from the socket. The two main questions for me are: If the client sends a lot of data and the logic processing is heavy, how will the system handle it (packets lost because buffer is to full)? Also, if all clients send data at the same time, will there be 1000 threads, or is there an internal limit and a new thread can’t start before another one completes execution?
Use a queue.
The receive function will be very short and execute fast, but you’ll have decent overhead because of the queue. Problems are, if your worker threads are not fast enough under heavy server load, your queue can get full, so maybe you have to force packet drops. You also get the Producer/Consumer problem, which can probably slow down the entire queue with to many locks.
So what will be the better design, logic in receive function, logic in worker threads or anything completely different I’ve missed so far.
Another Quest regarding data sending.
Is it better to have a SocketAsyncEventArgs tied to a socket (analog to the receive event) and use a buffer system to make one send call for a few small packets (let’s say the packets would otherwise sometimes! send directly one after another) or use a different SocketAsyncEventArgs for every packet and store them in a pool to reuse them?
To effectivly implement async sockets each socket will need more than 1 SocketAsyncEventArgs. There is also an issue with the byte[] buffer in each SocketAsyncEventArgs. In short, the byte buffers will be pinned whenever a managed - native transition occurs (sending / receiving). If you allocate the SocketAsyncEventArgs and byte buffers as needed you can run into OutOfMemoryExceptions with many clients due to fragmentation and the inability of the GC to compact pinned memory.
The best way to handle this is to create a SocketBufferPool class that will allocate a large number of bytes and SocketAsyncEventArgs when the application is first started, this way the pinned memory will be contiguous. Then simply resuse the buffers from the pool as needed.
In practice I've found it best to create a wrapper class around the SocketAsyncEventArgs and a SocketBufferPool class to manage the distribution of resources.
As an example, here is the code for a BeginReceive method:
private void BeginReceive(Socket socket)
{
Contract.Requires(socket != null, "socket");
SocketEventArgs e = SocketBufferPool.Instance.Alloc();
e.Socket = socket;
e.Completed += new EventHandler<SocketEventArgs>(this.HandleIOCompleted);
if (!socket.ReceiveAsync(e.AsyncEventArgs)) {
this.HandleIOCompleted(null, e);
}
}
And here is the HandleIOCompleted method:
private void HandleIOCompleted(object sender, SocketEventArgs e)
{
e.Completed -= this.HandleIOCompleted;
bool closed = false;
lock (this.sequenceLock) {
e.SequenceNumber = this.sequenceNumber++;
}
switch (e.LastOperation) {
case SocketAsyncOperation.Send:
case SocketAsyncOperation.SendPackets:
case SocketAsyncOperation.SendTo:
if (e.SocketError == SocketError.Success) {
this.OnDataSent(e);
}
break;
case SocketAsyncOperation.Receive:
case SocketAsyncOperation.ReceiveFrom:
case SocketAsyncOperation.ReceiveMessageFrom:
if ((e.BytesTransferred > 0) && (e.SocketError == SocketError.Success)) {
this.BeginReceive(e.Socket);
if (this.ReceiveTimeout > 0) {
this.SetReceiveTimeout(e.Socket);
}
} else {
closed = true;
}
if (e.SocketError == SocketError.Success) {
this.OnDataReceived(e);
}
break;
case SocketAsyncOperation.Disconnect:
closed = true;
break;
case SocketAsyncOperation.Accept:
case SocketAsyncOperation.Connect:
case SocketAsyncOperation.None:
break;
}
if (closed) {
this.HandleSocketClosed(e.Socket);
}
SocketBufferPool.Instance.Free(e);
}
The above code is contained in a TcpSocket class that will raise DataReceived & DataSent events. One thing to notice is the case SocketAsyncOperation.ReceiveMessageFrom: block; if the socket hasn't had an error it immediately starts another BeginReceive() which will allocate another SocketEventArgs from the pool.
Another important note is the SocketEventArgs SequenceNumber property set in the HandleIOComplete method. Although async requests will complete in the order queued, you are still subject to other thread race conditions. Since the code calls BeginReceive before raising the DataReceived event there is a possibility that the thread servicing the orginal IOCP will block after calling BeginReceive but before rasing the event while the second async receive completes on a new thread which raises the DataReceived event first. Although this is a fairly rare edge case it can occur and the SequenceNumber property gives the consuming app the ability to ensure that data is processed in the correct order.
One other area to be aware of is async sends. Oftentimes, async send requests will complete synchronously (SendAsync will return false if the call completed synchronously) and can severely degrade performance. The additional overhead of of the async call coming back on an IOCP can in practice cause worse performance than simply using the synchronous call. The async call requires two kernel calls and a heap allocation while the synchronous call happens on the stack.
Hope this helps,
Bill
In your code, you do this:
if (!socket.ReceiveAsync(e.AsyncEventArgs)) {
this.HandleIOCompleted(null, e);
}
But it is an error to do that. There is a reason why the callback is not invoked when it finishes synchronously, such action can fill up the stack.
Imagine that each ReceiveAsync is always returning synchronously. If your HandleIOCompleted was in a while, you could process the result that returned synchronously at the same stack level. If it didn't return synchronously, you break the while.
But, by doing the you you do, you end-up creating a new item in the stack... so if you have bad luck enough, you will cause stack overflow exceptions.

Categories