Apparent BufferBlock.Post/Receive/ReceiveAsync race/bug - c#

cross-posted to http://social.msdn.microsoft.com/Forums/en-US/tpldataflow/thread/89b3f71d-3777-4fad-9c11-50d8dc81a4a9
I know... I'm not really using TplDataflow to its maximum potential. ATM I'm simply using BufferBlock as a safe queue for message passing, where producer and consumer are running at different rates. I'm seeing some strange behaviour that leaves me stumped as to how to
proceed.
private BufferBlock<object> messageQueue = new BufferBlock<object>();
public void Send(object message)
{
var accepted=messageQueue.Post(message);
logger.Info("Send message was called qlen = {0} accepted={1}",
messageQueue.Count,accepted);
}
public async Task<object> GetMessageAsync()
{
try
{
var m = await messageQueue.ReceiveAsync(TimeSpan.FromSeconds(30));
//despite messageQueue.Count>0 next line
//occasionally does not execute
logger.Info("message received");
//.......
}
catch(TimeoutException)
{
//do something
}
}
In the code above (which is part of a 2000 line distributed solution), Send is being called periodically every 100ms or so. This means an item is Posted to messageQueue at around 10 times a second. This is verified. However, occasionally it appears that ReceiveAsync does not complete within the timeout (i.e. the Post is not causing ReceiveAsync to complete) and TimeoutException is being raised after 30s. At this point, messageQueue.Count is in the hundreds. This is unexpected. This problem has been observed at slower rates of posting too (1 post/second) and usually happens before 1000 items have passed through the BufferBlock.
So, to work around this issue, I am using the following code, which works, but occasionally causes 1s latency when receiving (due to the bug above occurring)
public async Task<object> GetMessageAsync()
{
try
{
object m;
var attempts = 0;
for (; ; )
{
try
{
m = await messageQueue.ReceiveAsync(TimeSpan.FromSeconds(1));
}
catch (TimeoutException)
{
attempts++;
if (attempts >= 30) throw;
continue;
}
break;
}
logger.Info("message received");
//.......
}
catch(TimeoutException)
{
//do something
}
}
This looks like a race condition in TDF to me, but I can't get to the bottom of why this doesn't occur in the other places where I use BufferBlock in a similar fashion. Experimentally changing from ReceiveAsync to Receive doesn't help. I haven't checked, but I imagine in isolation, the code above works perfectly. It's a pattern I've seen documented in "Introduction to TPL Dataflow" tpldataflow.docx.
What can I do to get to the bottom of this? Are there any metrics that might help infer what's happening? If I can't create a reliable test case, what more information can I offer?
Help!

Stephen seems to think the following is the solution
var m = await messageQueue.ReceiveAsync();
instead of:
var m = await messageQueue.ReceiveAsync(TimeSpan.FromSeconds(30));
Can you confirm or deny this?

Related

Calling async method without awaiting blocks execution of the rest of ASP.NET Core services

I'm currently working on ASP.NET Core WebApp, which consist of web server and two long-running services– TCP Server (for managing my own clients) and TCP Client (integration with external platform).
Both of services are running alongside web sever– I achieved that, by making them inherit from BackgroundService and injecting to DI in this way:
services.AddHostedService(provider => provider.GetService<TcpClientService>());
services.AddHostedService(provider => provider.GetService<TcpServerService>());
Unfortunately, while development I ran into weird issue (which doesn't let me sleep at night so at this point I beg for your help). For some reason async code in TcpClientService blocks execution of other services (web server and tcp server).
using System;
using System.IO;
using System.Net.Sockets;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;
namespace ClientService.AsyncPoblem
{
public class TcpClientService : BackgroundService
{
private readonly ILogger<TcpClientService> _logger;
private bool Connected { get; set; }
private TcpClient TcpClient { get; set; }
public TcpClientService(ILogger<TcpClientService> logger)
{
_logger = logger;
}
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
while (!stoppingToken.IsCancellationRequested)
{
try
{
if (Connected)
{
await Task.Delay(100, stoppingToken); // check every 100ms if still connected
}
else
{
TcpClient = new TcpClient("localhost", 1234);
HandleClient(TcpClient); // <-- Call causing the issue
_logger.Log(LogLevel.Debug, "After call");
}
}
catch (Exception e)
{
// log the exception, wait for 3s and try again
_logger.Log(LogLevel.Critical, "An error occured while trying to connect with server.");
_logger.Log(LogLevel.Critical, e.ToString());
await Task.Delay(3000, stoppingToken);
}
}
}
private async Task HandleClient(TcpClient client)
{
Connected = true;
await using var ns = client.GetStream();
using var streamReader = new StreamReader(ns);
var msgBuilder = new StringBuilder();
bool reading = false;
var buffer = new char[1024];
while (!streamReader.EndOfStream)
{
var res = await streamReader.ReadAsync(buffer, 0, 1024);
foreach (var value in buffer)
{
if (value == '\x02')
{
msgBuilder.Clear();
reading = true;
}
else if (value == '\x03')
{
reading = false;
if (msgBuilder.Length > 0)
{
Console.WriteLine(msgBuilder);
msgBuilder.Clear();
}
}
else if (value == '\x00')
{
break;
}
else if (reading)
{
msgBuilder.Append(value);
}
}
Array.Clear(buffer, 0, buffer.Length);
}
Connected = false;
}
}
}
Call causing the issue is located in else statement of ExecuteAsync method
else
{
TcpClient = new TcpClient("localhost", 1234);
HandleClient(TcpClient); // <-- Call causing the issue
_logger.Log(LogLevel.Debug, "After call");
}
The code reads properly from the socket, but it blocks initialization of WebServer and TcpServer. Actually, even log method is not being reached. No matter if I put await in front of HandleClient() or not, the code behaves the same.
I've done some tests, and I figured out that this piece of code is not blocking anymore ("After call" log shows up):
else
{
TcpClient = new TcpClient("localhost", 1234);
await Task.Delay(1);
HandleClient(TcpClient); // <- moving Task.Delay into HandleClient also works
_logger.Log(LogLevel.Debug, "After call");
}
This also works like a charm (if I try to await Task.Run(), it will block "After call" log, but rest of app will start with no problem):
else
{
tcpClient = new TcpClient("localhost", 6969);
Connected = true;
Task.Run(() => ReceiveAsync(tcpClient));
_logger.Log(LogLevel.Debug, "After call");
}
There is couple more combinations which make it work, but my question is– why other methods work (especially 1ms delay- this completely shut downs my brain) and firing HandleClient() without await doesn't? I know that fire and forget may not be the most elegant solution, but it should work and do it's job shouldn't it? I searched for almost a month, and still didn't find a single explanation for that. At this point I have hard time falling asleep at night, cause I have no one to ask and can't stop thinking about that..
Update
(Sorry for disappearing for over a day without any answers)
After many many hours of investigation, I started debugging once again. Every time I would hit while loop in HandleClient(), I was losing control over debugger, program seemed to continue to work, but it would never reach await streamReader.ReadAsync(). At some point I decided to change condition in the while loop to true (I have no idea why I didn't think of trying it before), and everything began to work as expected. Messages would get read from tcp socket, and other services would fire up without any issues.
Here is piece of code causing issue
while (!streamReader.EndOfStream) <----- issue
{
var res = await streamReader.ReadAsync(buffer, 0, 1024);
// ...
After that observation, I decided to print out the result of EndOfStream before reaching the loop, to see what happens
Console.WriteLine(streamReader.EndOfStream);
while (!streamReader.EndOfStream)
{
var res = await streamReader.ReadAsync(buffer, 0, 1024);
// ...
Now the exact same thing was happening, but before even reaching the loop!
Explanation
Note:
I'm not senior programmer, especially when it comes to dealing with asynchronous TCP communication so I might be wrong here, but I will try to do my best.
streamReader.EndOfStream is not a regular field, it is a property, and it has logic inside it's getter.
This is how it looks like from the inside:
public bool EndOfStream
{
get
{
ThrowIfDisposed();
CheckAsyncTaskInProgress();
if (_charPos < _charLen)
{
return false;
}
// This may block on pipes!
int numRead = ReadBuffer();
return numRead == 0;
}
}
EndOfStream getter is synchronous method. To detect whether stream has ended or not, it calls ReadBuffer(). Since there is no data in the buffer yet and stream hasn't ended, method hangs until there is some data to read. Unfortunately it cannot be used in asynchronous context, it will always block (unfortunately because it seems to be the only way to instantly detect interrupted connection, broken cable or end of stream).
I don't have finished piece of code yet, I need to rewrite it and add some broken connection detection. I will post my solution I soon as I finish.
I would like to thank everyone for trying to help me, and especially #RoarS. who took biggest part in discussion, and spent some of his own time to take a closer look at my issue.
This is poorly documented behaviour of the BackgroundService class. All registered IHostedService will be started sequentially in the order they were registered. The application will not start until each IHostedService has returned from StartAsync. A BackgroundService is an IHostedService that starts your ExecuteAsync task before returning from StartAsync. Async methods will run until their first call to await an incomplete task before returning.
TLDR; If you don't await anything in your ExecuteAsync method, the server will never start.
Since you aren't awaiting that async method, your code boils down to;
while(true)
HandleClient(...);
(Do you really want to spawn an infinite number of TcpClient as fast as the CPU will go?). There's a really easy fix;
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
await Task.Yield();
// ...
}

Why does my delay task not trigger the Task.WaitAny()

So I have the following:
if (await Task.WhenAny(myLongRunningTask, Task.Delay(9000)) != myLongRunnnigTask && !myLongRunningTask.IsFaulted)
{
Console.WriteLine("TimedOut");
}
else
{
Console.WriteLine("Completed");
}
Seems to work fine when I complete my task; but if I never complete the long running task then this just hangs instead of timing out after 9 seconds. Even put in the faulted test to be sure.
(NOTE: Real code does more than just commandline write; but it never even gets into the scope; though I did try just commandline writing too... no change.)
Yet doing what looks to me to be exactly the same thing in LinqPad:
async void Main()
{
var Other = Task.Factory.StartNew(() => { Thread.Sleep(15000); });
if(await Task.WhenAny(Other, Task.Delay(4000)) != Other && !Other.IsFaulted) "TimedOut".Dump();
else "Completed".Dump();
Other = Task.Factory.StartNew(() => { Thread.Sleep(1000); });
if(await Task.WhenAny(Other, Task.Delay(4000)) != Other && !Other.IsFaulted) "TimedOut".Dump();
else "Completed".Dump();
}
That happily writes TimedOut then Completed.
The first code section is deep down in a pretty huge module. But I can't see what might side-effect it to this odd behaviour... What might I be missing?
NOTE on accepted answer:
The question here was what might be side-effecting this. #Douglas' answer indicates the side effect that was impacting my code. That does not necessarily fix the issue; just tells you where your problem lies. However he helpfully added a comment with a link to an article that does help you fix it.
This might happen if you're blocking on that async method from a synchronization context that executes continuations on the original thread, such as a WPF or WinForms UI. Try adding ConfigureAwait(false) and check whether that avoids the hanging:
var readyTask = await Task.WhenAny(myLongRunningTask, Task.Delay(9000)).ConfigureAwait(false);
if (readyTask != myLongRunnnigTask && !myLongRunningTask.IsFaulted)
{
Console.WriteLine("TimedOut");
}
else
{
Console.WriteLine("Completed");
}
But I can't see what might side-effect it to this odd behaviour...
Your longRunningTask is likely assigned the return value of a method like:
private static Task NewMethod()
{
Thread.Sleep(11000);
return Task.CompletedTask;
}
then later:
var myLongRunningTask = NewMethod();
Thus, by the time you reach the var readyTask = await line, the 11 second sleep ('long running task') has already occurred (since NewMethod is not asynchonous), and thus Task.WhenAny returns instantly (since longRunningTask is complete).

Alternative in a situation of recurring Task demand

I have observer module which takes care of subscriptions of some reactive stream I have created from Kafka. Sadly I need to Poll in order to receive messages from kafka, so I need to dedicate one background thread for that. My first solution was this one:
public void Poll()
{
if (Interlocked.Exchange(ref _state, POLLING) == NOTPOLLING)
{
Task.Run(() =>
{
while (CurrentSubscriptions.Count != 0)
{
_consumer.Poll(TimeSpan.FromSeconds(1));
}
_state = NOTPOLLING;
});
}
}
Now my reviewer suggested that I should Task because it have statuses and can be checked if they are running or not. This led to this code:
public void Poll()
{
// checks for statuses: WaitingForActivation, WaitingToRun, Running
if (_runningStatuses.Contains(_pollingTask.Status)) return;
_pollingTask.Start(); // this obviously throws exception once Task already completes and then I want to start it again
}
Task remained pretty much the same but check changed, now since my logic is that I want to start polling when I have subscriptions and stop when I don't I need to sort of re-use the Task, but since I can't I am wondering do I need to go back to my first implementation or is there any other neat way of doing this that right now I am missing?
I am wondering do I need to go back to my first implementation or is there any other neat way of doing this that right now I am missing?
Your first implementation looks fine. You might use a ManualResetEventSlim instead of enum and Interlocked.Exchange, but that's essentially the same... as long as you have just two states.
I think I made a compromise and removed Interlocked API for MethodImpl(MethodImpl.Options.Synchronized) it lets me have simple method body without possibly confusing Interlocked API code for eventual newcomer/inexperienced guy.
[MethodImpl(MethodImplOptions.Synchronized)]
public void Poll()
{
if (!_polling)
{
_polling = true;
new Task(() =>
{
while (_currentSubscriptions.Count != 0)
{
_consumer.Poll(TimeSpan.FromSeconds(1));
}
_polling = false;
}, TaskCreationOptions.LongRunning).Start();
}
}

non reentrant observable in c#

Given the following method:
If I leave the hack in place, my unit test completes immediately with "observable has no data".
If I take the hack out, there are multiple threads all attempting to login at the same time.
The host service does not allow this.
How do I ensure that only one thread is producing observables at any given point in time.
private static object obj = new object();
private static bool here = true;
public IObservable<Party> LoadAllParties(CancellationToken token)
{
var parties = Observable.Create<Party>(
async (observer, cancel) =>
{
// this is just a hack to test behavior
lock (obj)
{
if (!here)
return;
here = false;
}
// end of hack.
try
{
if (!await this.RequestLogin(observer, cancel))
return;
// request list.
await this._request.GetAsync(this._configuration.Url.RequestList);
if (this.IsCancelled(observer, cancel))
return;
while (!cancel.IsCancellationRequested)
{
var entities = await this._request.GetAsync(this._configuration.Url.ProcessList);
if (this.IsCancelled(observer, cancel))
return;
var tranche = this.ExtractParties(entities);
// break out if it's the last page.
if (!tranche.Any())
break;
Array.ForEach(tranche, observer.OnNext);
await this._request.GetAsync(this._configuration.Url.ProceedList);
if (this.IsCancelled(observer, cancel))
return;
}
observer.OnCompleted();
}
catch (Exception ex)
{
observer.OnError(ex);
}
});
return parties;
}
My Unit Test:
var sut = container.Resolve<SyncDataManager>();
var count = 0;
var token = new CancellationTokenSource();
var observable = sut.LoadAllParties(token.Token);
observable.Subscribe(party => count++);
await observable.ToTask(token.Token);
count.Should().BeGreaterThan(0);
I do think your question is suffering from the XY Problem - the code contains several calls to methods not included which may contain important side effects and I feel that going on the information available won't lead to the best advice.
That said, I suspect you did not intend to subscribe to observable twice - once with the explicit Subscribe call, and once with the ToTask() call. This would certainly explain the concurrent calls, which are occurring in two different subscriptions.
EDIT:
How about asserting on the length instead (tweak the timeout to suit):
var length = await observable.Count().Timeout(TimeSpan.FromSeconds(3));
Better would be to look into Rx-Testing and mock your dependencies. That's a big topic, but this long blog post from the Rx team explains it very well and this answer regarding TPL-Rx interplay may help: Executing TPL code in a reactive pipeline and controlling execution via test scheduler

Azure ServiceBus & async - To be, or not to be?

I'm running Service Bus on Azure, pumping about 10-100 messages per second.
Recently I've switched to .net 4.5 and all excited refactored all the code to have 'async' and 'await' at least twice in each line to make sure it's done 'properly' :)
Now I'm wondering whether it's actually for better or for worse. If you could have a look at the code snippets and let me know what your thoughts are. I especially worried if the thread context switching is not giving me more grief than benefit, from all the asynchrony... (looking at !dumpheap it's definitely a factor)
Just a bit of description - I will be posting 2 methods - one that does a while loop on a ConcurrentQueue, waiting for new messages and the other method that sends one message at a time. I'm also using the Transient Fault Handling block exactly as Dr. Azure prescribed.
Sending loop (started at the beginning, waiting for new messages):
private async void SendingLoop()
{
try
{
await this.RecreateMessageFactory();
this.loopSemaphore.Reset();
Buffer<SendMessage> message = null;
while (true)
{
if (this.cancel.Token.IsCancellationRequested)
{
break;
}
this.semaphore.WaitOne();
if (this.cancel.Token.IsCancellationRequested)
{
break;
}
while (this.queue.TryDequeue(out message))
{
try
{
using (message)
{
//only take send the latest message
if (!this.queue.IsEmpty)
{
this.Log.Debug("Skipping qeued message, Topic: " + message.Value.Topic);
continue;
}
else
{
if (this.Topic == null || this.Topic.Path != message.Value.Topic)
await this.EnsureTopicExists(message.Value.Topic, this.cancel.Token);
if (this.cancel.Token.IsCancellationRequested)
break;
await this.SendMessage(message, this.cancel.Token);
}
}
}
catch (OperationCanceledException)
{
break;
}
catch (Exception ex)
{
ex.LogError();
}
}
}
}
catch (OperationCanceledException)
{ }
catch (Exception ex)
{
ex.LogError();
}
finally
{
if (this.loopSemaphore != null)
this.loopSemaphore.Set();
}
}
Sending a message:
private async Task SendMessage(Buffer<SendMessage> message, CancellationToken cancellationToken)
{
//this.Log.Debug("MessageBroadcaster.SendMessage to " + this.GetTopic());
bool entityNotFound = false;
if (this.MessageSender.IsClosed)
{
//this.Log.Debug("MessageBroadcaster.SendMessage MessageSender closed, recreating " + this.GetTopic());
await this.EnsureMessageSender(cancellationToken);
}
try
{
await this.sendMessageRetryPolicy.ExecuteAsync(async () =>
{
message.Value.Body.Seek(0, SeekOrigin.Begin);
using (var msg = new BrokeredMessage(message.Value.Body, false))
{
await Task.Factory.FromAsync(this.MessageSender.BeginSend, this.MessageSender.EndSend, msg, null);
}
}, cancellationToken);
}
catch (MessagingEntityNotFoundException)
{
entityNotFound = true;
}
catch (OperationCanceledException)
{ }
catch (ObjectDisposedException)
{ }
catch (Exception ex)
{
ex.LogError();
}
if (entityNotFound)
{
if (!cancellationToken.IsCancellationRequested)
{
await this.EnsureTopicExists(message.Value.Topic, cancellationToken);
}
}
}
The code above is from a 'Sender' class that sends 1 message/second. I have about 50-100 instances running at any given time, so it could be quite a number of threads.
Btw do not worry about EnsureMessageSender, RecreateMessageFactory, EnsureTopicExists too much, they are not called that often.
Would I not be better of just having one background thread working through the message queue and sending messages synchronously, provided all I need is send one message at a time, not worry about the async stuff and avoid the overheads coming with it.
Note that usually it's a matter of milliseconds to send one Message to Azure Service Bus, it's not really expensive. (Except at times when it's slow, times out or there is a problem with Service Bus backend, it could be hanging for a while trying to send stuff).
Thanks and sorry for the long post,
Stevo
Proposed Solution
Would this example be a solution to my situation?
static void Main(string[] args)
{
var broadcaster = new BufferBlock<int>(); //queue
var cancel = new CancellationTokenSource();
var run = Task.Run(async () =>
{
try
{
while (true)
{
//check if we are not finished
if (cancel.IsCancellationRequested)
break;
//async wait until a value is available
var val = await broadcaster.ReceiveAsync(cancel.Token).ConfigureAwait(false);
int next = 0;
//greedy - eat up and ignore all the values but last
while (broadcaster.TryReceive(out next))
{
Console.WriteLine("Skipping " + val);
val = next;
}
//check if we are not finished
if (cancel.IsCancellationRequested)
break;
Console.WriteLine("Sending " + val);
//simulate sending delay
await Task.Delay(1000).ConfigureAwait(false);
Console.WriteLine("Value sent " + val);
}
}
catch (Exception ex)
{
Console.WriteLine(ex);
}
}, cancel.Token);
//simulate sending messages. One every 200mls
for (int i = 0; i < 20; i++)
{
Console.WriteLine("Broadcasting " + i);
broadcaster.Post(i);
Thread.Sleep(200);
}
cancel.Cancel();
run.Wait();
}
You say:
The code above is from a 'Sender' class that sends 1 message/second. I
have about 50-100 instances running at any given time, so it could be
quite a number of threads.
This is a good case for async. You save lots of threads here. Async reduces context switching because it is not thread-based. It does not context-switch in case of something requiring a wait. Instead, the next work item is being processed on the same thread (if there is one).
For that reason you async solution will definitely scale better than a synchronous one. Whether it actually uses less CPU at 50-100 instances of your workflow needs to be measured. The more instances there are the higher the probability of async being faster becomes.
Now, there is one problem with the implementation: You're using a ConcurrentQueue which is not async-ready. So you actually do use 50-100 threads even in your async version. They will either block (which you wanted to avoid) or busy-wait burning 100% CPU (which seems to be the case in your implementation!). You need to get rid of this problem and make the queuing async, too. Maybe a SemaphoreSlim is of help here as it can be waited on asynchronously.
First, keep in mind that Task != Thread. Tasks (and async method continuations) are scheduled to the thread pool, where Microsoft has put in tons of optimizations that work wonders as long as your tasks are fairly short.
Reviewing your code, one line raises a flag: semaphore.WaitOne. I assume you're using this as a kind of signal that there is data available in the queue. This is bad because it's a blocking wait inside an async method. By using a blocking wait, the code changes from a lightweight continuation into a much heavier thread pool thread.
So, I would follow #usr's recommendation and replace the queue (and the semaphore) with an async-ready queue. TPL Dataflow's BufferBlock<T> is an async-ready producer/consumer queue available via NuGet. I recommend this one first because it sounds like your project could benefit from using dataflow more extensively than just as a queue (but the queue is a fine place to start).
Other async-ready data structures exist; my AsyncEx library has a couple of them. It's also not hard to build a simple one yourself; I have a blog post on the subject. But I recommend TPL Dataflow in your situation.

Categories