I'm trying to rewrite some code using Reactive Extensions for .NET but I need some guidance on how to achieve my goal.
I have a class that encapsulates some asynchronous behavior in a low level library. Think something that either reads or writes the network. When the class is started it will try to connect to the environment and when succesful it will signal this back by calling from a worker thread.
I want to turn this asynchronous behavior into a synchronous call and I have created a greatly simplified example below on how that can be achieved:
ManualResetEvent readyEvent = new ManualResetEvent(false);
public void Start(TimeSpan timeout) {
// Simulate a background process
ThreadPool.QueueUserWorkItem(_ => AsyncStart(TimeSpan.FromSeconds(1)));
// Wait for startup to complete.
if (!this.readyEvent.WaitOne(timeout))
throw new TimeoutException();
}
void AsyncStart(TimeSpan delay) {
Thread.Sleep(delay); // Simulate startup delay.
this.readyEvent.Set();
}
Running AsyncStart on a worker thread is just a way to simulate the asynchronous behavior of the library and is not part of my real code where the low level library supplies the thread and calls my code on a callback.
Notice that the Start method will throw a TimeoutException if start hasn't completed within the timeout interval.
I want to rewrite this code to use Rx. Here is my first attempt:
Subject<Unit> readySubject = new Subject<Unit>();
public void Start(TimeSpan timeout) {
ThreadPool.QueueUserWorkItem(_ => AsyncStart(TimeSpan.FromSeconds(1)));
// Point A - see below
this.readySubject.Timeout(timeout).First();
}
void AsyncStart(TimeSpan delay) {
Thread.Sleep(delay);
this.readySubject.OnNext(new Unit());
}
This is a decent attempt but unfortunately it contains a race condition. If the startup completes fast (e.g. if delay is 0) and if there is an additonal delay at point A then OnNext will be called on readySubject before First has executed. In essence the IObservable I'm applying Timeout and First never sees that startup has completed and a TimeoutException will be thrown instead.
It seems that Observable.Defer has been created to handle problems like this. Here is slightly more complex attempt to use Rx:
Subject<Unit> readySubject = new Subject<Unit>();
void Start(TimeSpan timeout) {
var ready = Observable.Defer(() => {
ThreadPool.QueueUserWorkItem(_ => AsyncStart(TimeSpan.FromSeconds(1)));
// Point B - see below
return this.readySubject.AsObservable();
});
ready.Timeout(timeout).First();
}
void AsyncStart(TimeSpan delay) {
Thread.Sleep(delay);
this.readySubject.OnNext(new Unit());
}
Now the asynchronous operation is not started immediately but only when the IObservable is being used. Unfortunately there is still a race condition but this time at point B. If the asynchronous operation started calls OnNext before the Defer lambda returns it is still lost and a TimeoutException will be thrown by Timeout.
I know I can use operators like Replay to buffer events but my initial example without Rx doesn't use any kind of buffering. Is there a way for me to use Rx to solve my problem without race conditions? In essence starting the asynchronous operation only after the IObservable has been connected to in this case Timeout and First?
Based on Ana Betts's answer here is working solution:
void Start(TimeSpan timeout) {
var readySubject = new AsyncSubject<Unit>();
ThreadPool.QueueUserWorkItem(_ => AsyncStart(readySubject, TimeSpan.FromSeconds(1)));
// Point C - see below
readySubject.Timeout(timeout).First();
}
void AsyncStart(ISubject<Unit> readySubject, TimeSpan delay) {
Thread.Sleep(delay);
readySubject.OnNext(new Unit());
readySubject.OnCompleted();
}
The interesting part is when there is a delay at point C that is longer than the time it takes for AsyncStart to complete. AsyncSubject retains the last notification sent and Timeout and First will still perform as expected.
So, one thing to know about Rx I think a lot of people do at first (myself included!): if you're using any traditional threading function like ResetEvents, Thread.Sleeps, or whatever, you're Doing It Wrong (tm) - it's like casting things to Arrays in LINQ because you know that the underlying type happens to be an array.
The key thing to know is that an async func is represented by a function that returns IObservable<TResult> - that's the magic sauce that lets you signal when something has completed. So here's how you'd "Rx-ify" a more traditional async func, like you'd see in a Silverlight web service:
IObservable<byte[]> readFromNetwork()
{
var ret = new AsyncSubject();
// Here's a traditional async function that you provide a callback to
asyncReaderFunc(theFile, buffer => {
ret.OnNext(buffer);
ret.OnCompleted();
});
return ret;
}
This is a decent attempt but unfortunately it contains a race condition.
This is where AsyncSubject comes in - this makes sure that even if asyncReaderFunc beats the Subscribe to the punch, AsyncSubject will still "replay" what happened.
So, now that we've got our function, we can do lots of interesting things to it:
// Make it into a sync function
byte[] results = readFromNetwork().First();
// Keep reading blocks one at a time until we run out
readFromNetwork().Repeat().TakeUntil(x => x == null || x.Length == 0).Subscribe(bytes => {
Console.WriteLine("Read {0} bytes in chunk", bytes.Length);
})
// Read the entire stream and get notified when the whole deal is finished
readFromNetwork()
.Repeat().TakeUntil(x => x == null || x.Length == 0)
.Aggregate(new MemoryStream(), (ms, bytes) => ms.Write(bytes))
.Subscribe(ms => {
Console.WriteLine("Got {0} bytes in total", ms.ToArray().Length);
});
// Or just get the entire thing as a MemoryStream and wait for it
var memoryStream = readFromNetwork()
.Repeat().TakeUntil(x => x == null || x.Length == 0)
.Aggregate(new MemoryStream(), (ms, bytes) => ms.Write(bytes))
.First();
I would further add to Paul's comment of adding WaitHandles means you are doing it wrong, that using Subjects directly usually means you are doing it wrong too. ;-)
Try to consider your Rx code working with sequences or pipelines. Subjects offer read and write capabilities which means you are no longer working with a pipeline or a sequence anymore (unless you have pipleines that go both ways or sequences that can reverse?!?)
So first Paul's code is pretty cool, but let's "Rx the hell out of it".
1st The AsyncStart method change it to this
IObservable<Unit> AsyncStart(TimeSpan delay)
{
Observable.Timer(delay).Select(_=>Unit.Default);
}
So easy! Look no subjects and data only flows one way. The important thing here is the signature change. It will push stuff to us. This is now very explicit. Passing in a Subject to me is very ambiguous.
2nd. We now dont need the Subject defined in the start method. We can also leverage the Scheduler features instead of the old-skool ThreadPool.QueueUserWorkItem.
void Start(TimeSpan timeout)
{
var isReady = AsyncStart(TimeSpan.FromSeconds(1))
.SubscribeOn(Scheduler.ThreadPool)
.PublishLast();
isReady.Connect();
isReady.Timeout(timeout).First();
}
Now we have a clear pipeline or sequence of events
AsyncStart --> isReady --> Start
Instead of Start-->AsyncStart-->Start
If I knew more of your problem space, I am sure we could come up with an even better way of doing this that did not require the blocking nature of the start method. The more you use Rx the more you will find that your old assumptions on when you need to block, use waithandles, etc can be thrown out the window.
Related
I have a server which communicates with 50 or more devices over TCP LAN. There is a Task.Run for each socket reading message loop.
I buffer each message reach into a blocking queue, where each blocking queue has a Task.Run using a BlockingCollection.Take().
So something like (semi-pseudocode):
Socket Reading Task
Task.Run(() =>
{
while (notCancelled)
{
element = ReadXml();
switch (element)
{
case messageheader:
MessageBlockingQueue.Add(deserialze<messageType>());
...
}
}
});
Message Buffer Task
Task.Run(() =>
{
while (notCancelled)
{
Process(MessageQueue.Take());
}
});
So that would make 50+ reading tasks and 50+ tasks blocking on their own buffers.
I did it this way to avoid blocking the reading loop and allow the program to distribute processing time on messages more fairly, or so I believe.
Is this an inefficient way to handle it? what would be a better way?
You may be interested in the "channels" work, in particular: System.Threading.Channels. The aim of this is to provider asynchronous producer/consumer queues, covering both single and multiple producer and consumer scenarios, upper limits, etc. By using an asynchronous API, you aren't tying up lots of threads just waiting for something to do.
Your read loop would become:
while (notCancelled) {
var next = await queue.Reader.ReadAsync(optionalCancellationToken);
Process(next);
}
and the producer:
switch (element)
{
case messageheader:
queue.Writer.TryWrite(deserialze<messageType>());
...
}
so: minimal changes
Alternatively - or in combination - you could look into things like "pipelines" (https://www.nuget.org/packages/System.IO.Pipelines/) - since you're dealing with TCP data, this would be an ideal fit, and is something I've looked at for the custom web-socket server here on Stack Overflow (which deals with huge numbers of connections). Since the API is async throughout, it does a good job of balancing work - and the pipelines API is engineered with typical TCP scenarios in mind, for example partially consuming incoming data streams as you detect frame boundaries. I've written about this usage a lot, with code examples mostly here. Note that "pipelines" doesn't include a direct TCP layer, but the "kestrel" server includes one, or the third-party library https://www.nuget.org/packages/Pipelines.Sockets.Unofficial/ does (disclosure: I wrote it).
I actually do something similar in another project. What I learned or would do differently are the following:
First of all, better to use dedicated threads for the reading/writing loop (with new Thread(ParameterizedThreadStart)) because Task.Run uses a pool thread and as you use it in a (nearly) endless loop the thread is practically never returned to the pool.
var thread = new Thread(ReaderLoop) { Name = nameof(ReaderLoop) }; // priority, etc if needed
thread.Start(cancellationToken);
Your Process can be an event, which you can invoke asynchronously so your reader loop can be return immediately to process the new incoming packages as fast as possible:
private void ReaderLoop(object state)
{
var token = (CancellationToken)state;
while (!token.IsCancellationRequested)
{
try
{
var message = MessageQueue.Take(token);
OnMessageReceived(new MessageReceivedEventArgs(message));
}
catch (OperationCanceledException)
{
if (!disposed && IsRunning)
Stop();
break;
}
}
}
Please note that if a delegate has multiple targets it's async invocation is not trivial. I created this extension method for invoking a delegate on pool threads:
public static void InvokeAsync<TEventArgs>(this EventHandler<TEventArgs> eventHandler, object sender, TEventArgs args)
{
void Callback(IAsyncResult ar)
{
var method = (EventHandler<TEventArgs>)ar.AsyncState;
try
{
method.EndInvoke(ar);
}
catch (Exception e)
{
HandleError(e, method);
}
}
foreach (EventHandler<TEventArgs> handler in eventHandler.GetInvocationList())
handler.BeginInvoke(sender, args, Callback, handler);
}
So the OnMessageReceived implementation can be:
protected virtual void OnMessageReceived(MessageReceivedEventArgs e)
=> messageReceivedHandler.InvokeAsync(this, e);
Finally it was a big lesson that BlockingCollection<T> has some performance issues. It uses SpinWait internally, whose SpinOnce method waits longer and longer times if there is no incoming data for a long time. This is a tricky issue because even if you log every single step of the processing you will not notice that everything is started delayed unless you can mock also the server side. Here you can find a fast BlockingCollection implementation using an AutoResetEvent for triggering incoming data. I added a Take(CancellationToken) overload to it as follows:
/// <summary>
/// Takes an item from the <see cref="FastBlockingCollection{T}"/>
/// </summary>
public T Take(CancellationToken token)
{
T item;
while (!queue.TryDequeue(out item))
{
waitHandle.WaitOne(cancellationCheckTimeout); // can be 10-100 ms
token.ThrowIfCancellationRequested();
}
return item;
}
Basically that's it. Maybe not everything is applicable in your case, eg. if the nearly immediate response is not crucial the regular BlockingCollection also will do it.
Yes, this is a bit inefficient, because you block ThreadPool threads.
I already discussed this problem Using Task.Yield to overcome ThreadPool starvation while implementing producer/consumer pattern
You can also look at examples with testing a producer -consumer pattern here:
https://github.com/BBGONE/TestThreadAffinity
You can use await Task.Yield in the loop to give other tasks access to this thread.
You can solve it also by using dedicated threads or better a custom ThreadScheduler which uses its own thread pool. But it is ineffective to create 50+ plain threads. Better to adjust the task, so it would be more cooperative.
If you use a BlockingCollection (because it can block the thread for long while waiting to write (if bounded) or to read or no items to read) then it is better to use System.Threading.Tasks.Channels https://github.com/stephentoub/corefxlab/blob/master/src/System.Threading.Tasks.Channels/README.md
They don't block the thread while waiting when the collection will be available to write or to read. There's an example how it is used https://github.com/BBGONE/TestThreadAffinity/tree/master/ThreadingChannelsCoreFX/ChannelsTest
This might not be specific to SemaphoreSlim exclusively, but basically my question is about whether there is a difference between the below two methods of throttling a collection of long running tasks, and if so, what that difference is (and when if ever to use either).
In the example below, let's say that each tracked task involves loading data from a Url (totally made up example, but is a common one that I've found for SemaphoreSlim examples).
The main difference comes down to how the individual tasks are added to the list of tracked tasks. In the first example, we call Task.Run() with a lambda, whereas in the second, we new up a Func(<Task<Result>>()) with a lambda and then immediately call that func and add the result to the tracked task list.
Examples:
Using Task.Run():
SemaphoreSlim ss = new SemaphoreSlim(_concurrentTasks);
List<string> urls = ImportUrlsFromSource();
List<Task<Result>> trackedTasks = new List<Task<Result>>();
foreach (var item in urls)
{
await ss.WaitAsync().ConfigureAwait(false);
trackedTasks.Add(Task.Run(async () =>
{
try
{
return await ProcessUrl(item);
}
catch (Exception e)
{
_log.Error($"logging some stuff");
throw;
}
finally
{
ss.Release();
}
}));
}
var results = await Task.WhenAll(trackedTasks);
Using a new Func:
SemaphoreSlim ss = new SemaphoreSlim(_concurrentTasks);
List<string> urls = ImportUrlsFromSource();
List<Task<Result>> trackedTasks = new List<Task<Result>>();
foreach (var item in urls)
{
trackedTasks.Add(new Func<Task<Result>>(async () =>
{
await ss.WaitAsync().ConfigureAwait(false);
try
{
return await ProcessUrl(item);
}
catch (Exception e)
{
_log.Error($"logging some stuff");
throw;
}
finally
{
ss.Release();
}
})());
}
var results = await Task.WhenAll(trackedTasks);
There are two differences:
Task.Run does error handling
First off all, when you call the lambda, it runs. On the other hand, Task.Run would call it. This is relevant because Task.Run does a bit of work behind the scenes. The main work it does is handling a faulted task...
If you call a lambda, and the lambda throws, it would throw before you add the Task to the list...
However, in your case, because your lambda is async, the compiler would create the Task for it (you are not making it by hand), and it will correctly handle the exception and make it available via the returned Task. Therefore this point is moot.
Task.Run prevents task attachment
Task.Run sets DenyChildAttach. This means that the tasks created inside the Task.Run run independently from (are not synchronized with) the returned Task.
For example, this code:
List<Task<int>> trackedTasks = new List<Task<int>>();
var numbers = new int[]{0, 1, 2, 3, 4};
foreach (var item in numbers)
{
trackedTasks.Add(Task.Run(async () =>
{
var x = 0;
(new Func<Task<int>>(async () =>{x = item; return x;}))().Wait();
Console.WriteLine(x);
return x;
}));
}
var results = await Task.WhenAll(trackedTasks);
Will output the numbers from 0 to 4, in unknown order. However the following code:
List<Task<int>> trackedTasks = new List<Task<int>>();
var numbers = new int[]{0, 1, 2, 3, 4};
foreach (var item in numbers)
{
trackedTasks.Add(new Func<Task<int>>(async () =>
{
var x = 0;
(new Func<Task<int>>(async () =>{x = item; return x;}))().Wait();
Console.WriteLine(x);
return x;
})());
}
var results = await Task.WhenAll(trackedTasks);
Will output the numbers from 0 to 4, in order, every time. This is odd, right? What happens is that the inner task is attached to outer one, and executed right away in the same thread. But if you use Task.Run, the inner task is not attached and scheduled independently.
This remain true even if you use await, as long as the task you await does not go to an external system...
What happens with external system? Well, for example, if your task is reading from an URL - as in your example - the system would create a TaskCompletionSource, get the Task from it, set a response handler that writes the result to the TaskCompletionSource, make the request, and return the Task. This Task is not scheduled, it running on the same thread as a parent task makes no sense. And thus, it can break the order.
Since, you are using await to wait on an external system, this point is moot too.
Conclusion
I must conclude that these are equivalent.
If you want to be safe, and make sure it works as expected, even if - in a future version - some of the above points stops being moot, then keep Task.Run. On the other hand, if you really want to optimize, use the lambda and avoid the Task.Run (very small) overhead. However, that probably won't be a bottleneck.
Addendum
When I talk about a task that goes to an external system, I refer to something that runs outside of .NET. There a bit of code that will run in .NET to interface with the external system, but the bulk of the code will not run in .NET, and thus will not be in a managed thread at all.
The consumer of the API specify nothing for this to happen. The task would be a promise task, but that is not exposed, for the consumer there is nothing special about it.
In fact, a task that goes to an external system may barely run in the CPU at all. Futhermore, it might just be waiting on something exterior to the computer (it could be the network or user input).
The pattern is as follows:
The library creates a TaskCompletionSource.
The library sets a means to recieve a notification. It can be a callback, event, message loop, hook, listening to a socket, a pipe line, waiting on a global mutex... whatever is necesary.
The library sets code to react to the notification that will call SetResult, or SetException on the TaskCompletionSource as appropiate for the notification recieved.
The library does the actual call to the external system.
The library returns TaskCompletionSource.Task.
Note: with extra care of optimization not reordering things where it should not, and with care of handling errors during the setup phase. Also, if a CancellationToken is involved, it has to be taken into account (and call SetCancelled on the TaskCompletionSource when appropiate). Also, there could be tear down necesary in the reaction to the notification (or on cancellation). Ah, do not forget to validate your parameters.
Then the external system goes and does whatever it does. Then when it finishes, or something goes wrong, gives the library the notification, and your Task is sudendtly completed, faulted... (or if cancellation happened, your Task is now cancelled) and .NET will schedule the continuations of the task as needed.
Note: async/await uses continuations behind the scenes, that is how execution resumes.
Incidentally, if you wanted to implement SempahoreSlim yourself, you would have to do something very similar to what I describe above. You can see it in my backport of SemaphoreSlim.
Let us see a couple of examples of promise tasks...
Task.Delay: when we are waiting with Task.Delay, the CPU is not spinning. This is not running in a thread. In this case the notification mechanism will be an OS timer. When the OS sees that the time of the timer has elapsed, it will call into the CLR, and then the CLR will mark the task as completed. What thread was waiting? none.
FileStream.ReadSync: when we are reading from storage with FileStream.ReadSync the actual work is done by the device. The CRL has to declare a custom event, then pass the event, the file handle and the buffer to the OS... the OS calls the device driver, the device driver interfaces with the device. As the storage device recovers the information, it will write to memory (directly on the specified buffer) via DMA technology. And when it is done, it will set an interruption, that is handled by the driver, that notifies the OS, that calls the custom event, that marks the task as completed. What thread did read the data from storage? none.
A similar pattern will be used to download from a web page, except, this time the device goes to the network. How to make an HTTP request and how the system waits for a response is beyond the scope of this answer.
It is also possible that the external system is another program, in which case it would run on a thread. But it won't be a managed thread on your process.
Your take away is that these task do not run on any of your threads. And their timing might depend on external factors. Thus, it makes no sense to think of them as running in the same thread, or that we can predict their timing (well, except of course, in the case of the timer).
Both are not very good because they create the tasks immediately. The func version is a little less overhead since it saves the Task.Run route over the thread pool just to immediately end the thread pool work and suspend on the semaphore. You don't need an async Func, you could simplify this by using an async method (possibly a local function).
But you should not do this at all. Instead, use a helper method that implements a parallel async foreach.
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate {
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
Then you just go urls.ForEachAsync(myDop, async input => await ProcessAsync(input));
Here, the tasks are created on demand. You can even make the input stream lazy.
I'm working on a SMS-based game (Value Added Service), in which a question must be sent to each subscriber on a daily basis. There are over 500,000 subscribers and therefore performance is a key factor. Since each subscriber can be a difference state of the competition with different variables, database must be queried separately for each subscriber before sending a text message. To achieve the best performance I'm using .Net Task Parallel Library (TPL) to spawn parallel threadpool threads and do as much async operations as possible in each thread to finally send texts asap.
Before describing the actual problem there are some more information necessary to give about the code.
At first there was no async operation in the code. I just scheduled some 500,000 tasks with the default task scheduler into the Threadpool and each task would work through the routines, blocking on all EF (Entity Framework) queries and sequentially finishing its job. It was good, but not fast enough. Then I changed all EF queries to Async, the outcome was superb in speed but there has been so many deadlocks and timeouts in SQL server that about a third of the subscribers never received a text! After trying different solutions, I decided not to do too many Async Database operations while I have over 500,000 tasks running on a 24 core server (with at least 24 concurrent threadpool threads)!
I rolled back all the changes (the Asycn ones) expect for one web service call in each task which remained Async.
Now the weird case:
In my code, I have a boolean variable named "isCrossSellActive". When the variable is set some more DB operations take place and an asycn webservice call will happen on which the thread awaits. When this variable is false, none of these operations will happen including the async webservice call. Awkwardly when the variable is set the code runs so much faster than when it's not! It seems like for some reason the awaited async code (the cooperative thread) is making the code faster.
Here is the code:
public async Task AutoSendMessages(...)
{
//Get list of subscriptions plus some initialization
LimitedConcurrencyLevelTaskScheduler lcts = new LimitedConcurrencyLevelTaskScheduler(numberOfThreads);
TaskFactory taskFactory = new TaskFactory(lcts);
List<Task> tasks = new List<Task>();
//....
foreach (var sub in subscriptions)
{
AutoSendData data = new AutoSendData
{
ServiceId = serviceId,
MSISDN = sub.subscriber,
IsCrossSellActive = bolCrossSellHeader
};
tasks.Add(await taskFactory.StartNew(async (x) =>
{
await SendQuestion(x);
}, data));
}
GC.Collect();
try
{
Task.WaitAll(tasks.ToArray());
}
catch (AggregateException ae)
{
ae.Handle((ex) =>
{
_logRepo.LogException(1, "", ex);
return true;
});
}
await _autoSendRepo.SetAutoSendingStatusEnd(statusId);
}
public async Task SendQuestion(object data)
{
//extract variables from input parameter
try
{
if (isCrossSellActive)
{
int pieceCount = subscriptionRepo.GetSubscriberCarPieces(curSubscription.service, curSubscription.subscriber).Count(c => c.isConfirmed);
foreach (var rule in csRules)
{
if (rule.Applies)
{
if (await HttpClientHelper.GetJsonAsync<bool>(url, rule.TargetServiceBaseAddress))
{
int noOfAddedPieces = SomeCalculations();
if (noOfAddedPieces > 0)
{
crossSellRepo.SetPromissedPieces(curSubscription.subscriber, curSubscription.service,
rule.TargetShortCode, noOfAddedPieces, 0, rule.ExpirationLimitDays);
}
}
}
}
}
// The rest of the code. (Some db CRUD)
await SmsClient.SendSoapMessage(subscriber, smsBody);
}
catch (Exception ex){//...}
}
Ok, thanks to #usr and the clue he gave me, the problem is finally solved!
His comment drew my attention to the awaited taskFactory.StartNew(...) line which sequentially adds new tasks to the "tasks" list which is then awaited on by Task.WaitAll(tasks);
At first I removed the await keyword before the taskFactory.StartNew() and it led the code towards a horrible state of malfunction! I then returned the await keyword to before taskFactory.StartNew() and debugged the code using breakpoints and amazingly saw that the threads are ran one after another and sequentially before the first thread reaches the first await inside the "SendQuestion" routine. When the "isCrossSellActive" flag was set despite the more jobs a thread should do the first await keyword is reached earlier thus enabling the next scheduled task to run. But when its not set the only await keyword is the last line of the routine so its most likely to run sequentially to the end.
usr's suggestion to remove the await keyword in the for loop seemed to be correct but the problem was the Task.WaitAll() line would wait on the wrong list of Task<Task<void>> instead of Task<void>. I finally used Task.Run instead of TaskFactory.StartNew and everything changed. Now the service is working well. The final code inside the for loop is:
tasks.Add(Task.Run(async () =>
{
await SendQuestion(data);
}));
and the problem was solved.
Thank you all.
P.S. Read this article on Task.Run and why TaskFactory.StartNew is dangerous: http://blog.stephencleary.com/2013/08/startnew-is-dangerous.html
It's extremly hard to tell unless you add some profiling that tell you which code is taking longer now.
Without seeing more numbers my best guess would be that the SMS service doesn't like when you send too many requests in a short time and chokes. When you add the extra DB calls the extra delay make the sms service work better.
A few other small details:
await Task.WhenAll is usually a bit better than Task.WaitAll. WaitAll means the thread will sit around waiting. Making a deadlock slightly more likely.
Instead of:
tasks.Add(await taskFactory.StartNew(async (x) =>
{
await SendQuestion(x);
}, data));
You should be able to do
tasks.Add(SendQuestion(data));
I am trying to understand concurrency by doing it in code. I have a code snippet which I thought was running asynchronously. But when I put the debug writeline statements in, I found that it is running synchronously. Can someone explain what I need to do differently to push ComputeBB() onto another thread using Task.Something?
Clarification I want this code to run ComputeBB in some other thread so that the main thread will keep on running without blocking.
Here is the code:
{
// part of the calling method
Debug.WriteLine("About to call ComputeBB");
returnDTM.myBoundingBox = await Task.Run(() => returnDTM.ComputeBB());
Debug.WriteLine("Just called await ComputBB.");
return returnDTM;
}
private ptsBoundingBox2d ComputeBB()
{
Debug.WriteLine("Starting ComputeBB.");
Stopwatch sw = new Stopwatch(); sw.Start();
var point1 = this.allPoints.FirstOrDefault().Value;
var returnBB = new ptsBoundingBox2d(
point1.x, point1.y, point1.z, point1.x, point1.y, point1.z);
Parallel.ForEach(this.allPoints,
p => returnBB.expandByPoint(p.Value.x, p.Value.y, p.Value.z)
);
sw.Stop();
Debug.WriteLine(String.Format("Compute BB took {0}", sw.Elapsed));
return returnBB;
}
Here is the output in the immediate window:
About to call ComputeBB
Starting ComputeBB.
Compute BB took 00:00:00.1790574
Just called await ComputBB.
Clarification If it were really running asynchronously it would be in this order:
About to call ComputeBB
Just called await ComputBB.
Starting ComputeBB.
Compute BB took 00:00:00.1790574
But it is not.
Elaboration
The calling code has signature like so: private static async Task loadAsBinaryAsync(string fileName) At the next level up, though, I attempt to stop using async. So here is the call stack from top to bottom:
static void Main(string[] args)
{
aTinFile = ptsDTM.CreateFromExistingFile("TestSave.ptsTin");
// more stuff
}
public static ptsDTM CreateFromExistingFile(string fileName)
{
ptsDTM returnTin = new ptsDTM();
Task<ptsDTM> tsk = Task.Run(() => loadAsBinaryAsync(fileName));
returnTin = tsk.Result; // I suspect the problem is here.
return retunTin;
}
private static async Task<ptsDTM> loadAsBinaryAsync(string fileName)
{
// do a lot of processing
Debug.WriteLine("About to call ComputeBB");
returnDTM.myBoundingBox = await Task.Run(() => returnDTM.ComputeBB());
Debug.WriteLine("Just called await ComputBB.");
return returnDTM;
}
I have a code snippet which I thought was running asynchronously. But when I put the debug writeline statements in, I found that it is running synchronously.
await is used to asynchronously wait an operations completion. While doing so, it yields control back to the calling method until it's completion.
what I need to do differently to push ComputeBB() onto another thread
It is already ran on a thread pool thread. If you don't want to asynchronously wait on it in a "fire and forget" fashion, don't await the expression. Note this will have an effect on exception handling. Any exception which occurs inside the provided delegate would be captured inside the given Task, if you don't await, there is a chance they will go about unhandled.
Edit:
Lets look at this piece of code:
public static ptsDTM CreateFromExistingFile(string fileName)
{
ptsDTM returnTin = new ptsDTM();
Task<ptsDTM> tsk = Task.Run(() => loadAsBinaryAsync(fileName));
returnTin = tsk.Result; // I suspect the problem is here.
return retunTin;
}
What you're currently doing is synchronously blocking when you use tsk.Result. Also, for some reason you're calling Task.Run twice, once in each method. That is unnecessary. If you want to return your ptsDTM instance from CreateFromExistingFile, you will have to await it, there is no getting around that. "Fire and Forget" execution doesn't care about the result, at all. It simply wants to start whichever operation it needs, if it fails or succeeds is usually a non-concern. That is clearly not the case here.
You'll need to do something like this:
private PtsDtm LoadAsBinary(string fileName)
{
Debug.WriteLine("About to call ComputeBB");
returnDTM.myBoundingBox = returnDTM.ComputeBB();
Debug.WriteLine("Just called ComputeBB.");
return returnDTM;
}
And then somewhere up higher up the call stack, you don't actually need CreateFromExistingFiles, simply call:
Task.Run(() => LoadAsBinary(fileName));
When needed.
Also, please, read the C# naming conventions, which you're currently not following.
await's whole purpose is in adding the synchronicity back in asynchronous code. This allows you to easily partition the parts that are happenning synchronously and asynchronously. Your example is absurd in that it never takes any advantage whatsoever of this - if you just called the method directly instead of wrapping it in Task.Run and awaiting that, you would have had the exact same result (with less overhead).
Consider this, though:
await
Task.WhenAll
(
loadAsBinaryAsync(fileName1),
loadAsBinaryAsync(fileName2),
loadAsBinaryAsync(fileName3)
);
Again, you have the synchronicity back (await functions as the synchronization barrier), but you've actually performed three independent operations asynchronously with respect to each other.
Now, there's no reason to do something like this in your code, since you're using Parallel.ForEach at the bottom level - you're already using the CPU to the max (with unnecessary overhead, but let's ignore that for now).
So the basic usage of await is actually to handle asynchronous I/O rather than CPU work - apart from simplifying code that relies on some parts of CPU work being synchronised and some not (e.g. you have four threads of execution that simultaneously process different parts of the problem, but at some point have to be reunited to make sense of the individual parts - look at the Barrier class, for example). This includes stuff like "making sure the UI doesn't block while some CPU intensive operation happens in the background" - this makes the CPU work asynchronous with respect to the UI. But at some point, you still want to reintroduce the synchronicity, to make sure you can display the results of the work on the UI.
Consider this winforms code snippet:
async void btnDoStuff_Click(object sender, EventArgs e)
{
lblProgress.Text = "Calculating...";
var result = await DoTheUltraHardStuff();
lblProgress.Text = "Done! The result is " + result;
}
(note that the method is async void, not async Task nor async Task<T>)
What happens is that (on the GUI thread) the label is first assigned the text Calculating..., then the asynchronous DoTheUltraHardStuff method is scheduled, and then, the method returns. Immediately. This allows the GUI thread to do whatever it needs to do. However - as soon as the asynchronous task is complete and the GUI is free to handle the callback, the execution of btnDoStuff_Click will continue with the result already given (or an exception thrown, of course), back on the GUI thread, allowing you to set the label to the new text including the result of the asynchronous operation.
Asynchronicity is not an absolute property - stuff is asynchronous to some other stuff, and synchronous to some other stuff. It only makes sense with respect to some other stuff.
Hopefully, now you can go back to your original code and understand the part you've misunderstood before. The solutions are multiple, of course, but they depend a lot on how and why you're trying to do what you're trying to do. I suspect you don't actually need to use Task.Run or await at all - the Parallel.ForEach already tries to distribute the CPU work over multiple CPU cores, and the only thing you could do is to make sure other code doesn't have to wait for that work to finish - which would make a lot of sense in a GUI application, but I don't see how it would be useful in a console application with the singular purpose of calculating that single thing.
So yes, you can actually use await for fire-and-forget code - but only as part of code that doesn't prevent the code you want to continue from executing. For example, you could have code like this:
Task<string> result = SomeHardWorkAsync();
Debug.WriteLine("After calling SomeHardWorkAsync");
DoSomeOtherWorkInTheMeantime();
Debug.WriteLine("Done other work.");
Debug.WriteLine("Got result: " + (await result));
This allows SomeHardWorkAsync to execute asynchronously with respect to DoSomeOtherWorkInTheMeantime but not with respect to await result. And of course, you can use awaits in SomeHardWorkAsync without trashing the asynchronicity between SomeHardWorkAsync and DoSomeOtherWorkInTheMeantime.
The GUI example I've shown way above just takes advantage of handling the continuation as something that happens after the task completes, while ignoring the Task created in the async method (there really isn't much of a difference between using async void and async Task when you ignore the result). So for example, to fire-and-forget your method, you could use code like this:
async void Fire(string filename)
{
var result = await ProcessFileAsync(filename);
DoStuffWithResult(result);
}
Fire("MyFile");
This will cause DoStuffWithResult to execute as soon as result is ready, while the method Fire itself will return immediately after executing ProcessFileAsync (up to the first await or any explicit return someTask).
This pattern is usually frowned upon - there really isn't any reason to return void out of an async method (apart from event handlers); you could just as easily return Task (or even Task<T> depending on the scenario), and let the caller decide whether he wants his code to execute synchronously in respect to yours or not.
Again,
async Task FireAsync(string filename)
{
var result = await ProcessFileAsync(filename);
DoStuffWithResult(result);
}
Fire("MyFile");
does the same thing as using async void, except that the caller can decide what to do with the asynchronous task. Perhaps he wants to launch two of those in parallel and continue after all are done? He can just await Task.WhenAll(Fire("1"), Fire("2")). Or he just wants that stuff to happen completely asynchronously with respect to his code, so he'll just call Fire("1") and ignore the resulting Task (of course, ideally, you at the very least want to handle possible exceptions).
Microsoft just announced the new C# Async feature. Every example I've seen so far is about asynchronously downloading something from HTTP. Surely there are other important async things?
Suppose I'm not writing a new RSS client or Twitter app. What's interesting about C# Async for me?
Edit I had an Aha! moment while watching Anders' PDC session. In the past I have worked on programs that used "watcher" threads. These threads sit waiting for something to happen, like watching for a file to change. They aren't doing work, they're just idle, and notify the main thread when something happens. These threads could be replaced with await/async code in the new model.
Ooh, this sounds interesting. I'm not playing with the CTP just yet, just reviewing the whitepaper. After seeing Anders Hejlsberg's talk about it, I think I can see how it could prove useful.
As I understand, async makes writing asynchronous calls easier to read and implement. Very much in the same way writing iterators is easier right now (as opposed to writing out the functionality by hand). This is essential blocking processes since no useful work can be done, until it is unblocked. If you were downloading a file, you cannot do anything useful until you get that file letting the thread go to waste. Consider how one would call a function which you know will block for an undetermined length and returns some result, then process it (e.g., store the results in a file). How would you write that? Here's a simple example:
static object DoSomeBlockingOperation(object args)
{
// block for 5 minutes
Thread.Sleep(5 * 60 * 1000);
return args;
}
static void ProcessTheResult(object result)
{
Console.WriteLine(result);
}
static void CalculateAndProcess(object args)
{
// let's calculate! (synchronously)
object result = DoSomeBlockingOperation(args);
// let's process!
ProcessTheResult(result);
}
Ok good, we have it implemented. But wait, the calculation takes minutes to complete. What if we wanted to have an interactive application and do other things while the calculation took place (such as rendering the UI)? This is no good, since we called the function synchronously and we have to wait for it to finish effectively freezing the application since the thread is waiting to be unblocked.
Answer, call the function expensive function asynchronously. That way we're not bound to waiting for the blocking operation to complete. But how do we do that? We'd call the function asynchronously and register a callback function to be called when unblocked so we may process the result.
static void CalculateAndProcessAsyncOld(object args)
{
// obtain a delegate to call asynchronously
Func<object, object> calculate = DoSomeBlockingOperation;
// define the callback when the call completes so we can process afterwards
AsyncCallback cb = ar =>
{
Func<object, object> calc = (Func<object, object>)ar.AsyncState;
object result = calc.EndInvoke(ar);
// let's process!
ProcessTheResult(result);
};
// let's calculate! (asynchronously)
calculate.BeginInvoke(args, cb, calculate);
}
Note: Sure we could start another thread to do this but that would mean we're spawning a thread that just sits there waiting to be unblocked, then do some useful work. That would be a waste.
Now the call is asynchronous and we don't have to worry about waiting for the calculation to finish and process, it's done asynchronously. It will finish when it can. An alternative to calling code asynchronously directly, you could use a Task:
static void CalculateAndProcessAsyncTask(object args)
{
// create a task
Task<object> task = new Task<object>(DoSomeBlockingOperation, args);
// define the callback when the call completes so we can process afterwards
task.ContinueWith(t =>
{
// let's process!
ProcessTheResult(t.Result);
});
// let's calculate! (asynchronously)
task.Start();
}
Now we called our function asynchronously. But what did it take to get it that way? First of all, we needed the delegate/task to be able to call it asynchronously, we needed a callback function to be able to process the results, then call the function. We've turned a two line function call to much more just to call something asynchronously. Not only that, the logic in the code has gotten more complex then it was or could be. Although using a task helped simplify the process, we still needed to do stuff to make it happen. We just want to run asynchronously then process the result. Why can't we just do that? Well now we can:
// need to have an asynchronous version
static async Task<object> DoSomeBlockingOperationAsync(object args)
{
//it is my understanding that async will take this method and convert it to a task automatically
return DoSomeBlockingOperation(args);
}
static async void CalculateAndProcessAsyncNew(object args)
{
// let's calculate! (asynchronously)
object result = await DoSomeBlockingOperationAsync(args);
// let's process!
ProcessTheResult(result);
}
Now this was a very simplified example with simple operations (calculate, process). Imagine if each operation couldn't conveniently be put into a separate function but instead have hundreds of lines of code. That's a lot of added complexity just to gain the benefit of asynchronous calling.
Another practical example used in the whitepaper is using it on UI apps. Modified to use the above example:
private async void doCalculation_Click(object sender, RoutedEventArgs e) {
doCalculation.IsEnabled = false;
await DoSomeBlockingOperationAsync(GetArgs());
doCalculation.IsEnabled = true;
}
If you've done any UI programming (be it WinForms or WPF) and attempted to call an expensive function within a handler, you'll know this is handy. Using a background worker for this wouldn't be that much helpful since the background thread will be sitting there waiting until it can work.
Suppose you had a way to control some external device, let's say a printer. And you wanted to restart the device after a failure. Naturally it will take some time for the printer to start up and be ready for operation. You might have to account for the restart not helping and attempt to restart again. You have no choice but to wait for it. Not if you did it asynchronously.
static async void RestartPrinter()
{
Printer printer = GetPrinter();
do
{
printer.Restart();
printer = await printer.WaitUntilReadyAsync();
} while (printer.HasFailed);
}
Imagine writing the loop without async.
One last example I have. Imagine if you had to do multiple blocking operations in a function and wanted to call asynchronously. What would you prefer?
static void DoOperationsAsyncOld()
{
Task op1 = new Task(DoOperation1Async);
op1.ContinueWith(t1 =>
{
Task op2 = new Task(DoOperation2Async);
op2.ContinueWith(t2 =>
{
Task op3 = new Task(DoOperation3Async);
op3.ContinueWith(t3 =>
{
DoQuickOperation();
}
op3.Start();
}
op2.Start();
}
op1.Start();
}
static async void DoOperationsAsyncNew()
{
await DoOperation1Async();
await DoOperation2Async();
await DoOperation3Async();
DoQuickOperation();
}
Read the whitepaper, it actually has a lot of practical examples like writing parallel tasks and others.
I can't wait to start playing with this either in the CTP or when .NET 5.0 finally makes it out.
The main scenarios are any scenario that involves high latency. That is, lots of time between "ask for a result" and "obtain a result". Network requests are the most obvious example of high latency scenarios, followed closely by I/O in general, and then by lengthy computations that are CPU bound on another core.
However, there are potentially other scenarios that this technology will mesh nicely with. For example, consider scripting the logic of a FPS game. Suppose you have a button click event handler. When the player clicks the button you want to play a siren for two seconds to alert the enemies, and then open the door for ten seconds. Wouldn't it be nice to say something like:
button.Disable();
await siren.Activate();
await Delay(2000);
await siren.Deactivate();
await door.Open();
await Delay(10000);
await door.Close();
await Delay(1000);
button.Enable();
Each task gets queued up on the UI thread, so nothing blocks, and each one resumes the click handler at the right point after its job is finished.
I've found another nice use-case for this today: you can await user interaction.
For example, if one form has a button that opens another form:
Form toolWindow;
async void button_Click(object sender, EventArgs e) {
if (toolWindow != null) {
toolWindow.Focus();
} else {
toolWindow = new Form();
toolWindow.Show();
await toolWindow.OnClosed();
toolWindow = null;
}
}
Granted, this isn't really any simpler than
toolWindow.Closed += delegate { toolWindow = null; }
But I think it nicely demonstrates what await can do. And once the code in the event handler is non-trivial, await make programming much easier. Think about the user having to click a sequence of buttons:
async void ButtonSeries()
{
for (int i = 0; i < 10; i++) {
Button b = new Button();
b.Text = i.ToString();
this.Controls.Add(b);
await b.OnClick();
this.Controls.Remove(b);
}
}
Sure, you could do this with normal event handlers, but it would require you to take apart the loop and convert it into something much harder to understand.
Remember that await can be used with anything that gets completed at some point in the future. Here's the extension method Button.OnClick() to make the above work:
public static AwaitableEvent OnClick(this Button button)
{
return new AwaitableEvent(h => button.Click += h, h => button.Click -= h);
}
sealed class AwaitableEvent
{
Action<EventHandler> register, deregister;
public AwaitableEvent(Action<EventHandler> register, Action<EventHandler> deregister)
{
this.register = register;
this.deregister = deregister;
}
public EventAwaiter GetAwaiter()
{
return new EventAwaiter(this);
}
}
sealed class EventAwaiter
{
AwaitableEvent e;
public EventAwaiter(AwaitableEvent e) { this.e = e; }
Action callback;
public bool BeginAwait(Action callback)
{
this.callback = callback;
e.register(Handler);
return true;
}
public void Handler(object sender, EventArgs e)
{
callback();
}
public void EndAwait()
{
e.deregister(Handler);
}
}
Unfortunately it doesn't seem possible to add the GetAwaiter() method directly to EventHandler (allowing await button.Click;) because then the method wouldn't know how to register/deregister that event.
It's a bit of boilerplate, but the AwaitableEvent class can be re-used for all events (not just UI). And with a minor modification and adding some generics, you could allow retrieving the EventArgs:
MouseEventArgs e = await button.OnMouseDown();
I could see this being useful with some more complex UI gestures (drag'n'drop, mouse gestures, ...) - though you'd have to add support for cancelling the current gesture.
There are some samples and demos in the CTP that don't use the Net, and even some that don't do any I/O.
And it does apply to all multithreaded / parallel problem areas (that already exist).
Async and Await are a new (easier) way of structuring all parallel code, be it CPU-bound or I/O bound. The biggest improvement is in areas where before C#5 you had to use the APM (IAsyncResult) model, or the event model (BackgroundWorker, WebClient). I think that is why those examples lead the parade now.
A GUI clock is a good example; say you want to draw a clock, that updates the time shown every second. Conceptually, you want to write
while true do
sleep for 1 second
display the new time on the clock
and with await (or with F# async) to asynchronously sleep, you can write this code to run on the UI thread in a non-blocking fashion.
http://lorgonblog.wordpress.com/2010/03/27/f-async-on-the-client-side/
The async extensions are useful in some cases when you have an asynchronous operation. An asynchronous operation has a definite start and completion. When asynchronous operations complete, they may have a result or an error. (Cancellation is treated as a special kind of error).
Asynchronous operations are useful in three situations (broadly speaking):
Keeping your UI responsive. Any time you have a long-running operation (whether CPU-bound or I/O-bound), make it asynchronous.
Scaling your servers. Using asynchronous operations judiciously on the server side may help your severs to scale. e.g., asynchronous ASP.NET pages may make use of async operations. However, this is not always a win; you need to evaluate your scalability bottlenecks first.
Providing a clean asynchronous API in a library or shared code. async is excellent for reusability.
As you begin to adopt the async way of doing things, you'll find the third situation becoming more common. async code works best with other async code, so asynchronous code kind of "grows" through the codebase.
There are a couple of types of concurrency where async is not the best tool:
Parallelization. A parallel algorithm may use many cores (CPUs, GPUs, computers) to solve a problem more quickly.
Asynchronous events. Asynchronous events happen all the time, independent of your program. They often do not have a "completion." Normally, your program will subscribe to an asynchronous event stream, receive some number of updates, and then unsubscribe. Your program can treat the subscribe and unsubscribe as a "start" and "completion", but the actual event stream never really stops.
Parallel operations are best expressed using PLINQ or Parallel, since they have a lot of built-in support for partitioning, limited concurrency, etc. A parallel operation may easily be wrapped in an awaitable by running it from a ThreadPool thread (Task.Factory.StartNew).
Asynchronous events do not map well to asynchronous operations. One problem is that an asynchronous operation has a single result at its point of completion. Asynchronous events may have any number of updates. Rx is the natural language for dealing with asynchronous events.
There are some mappings from an Rx event stream to an asynchronous operation, but none of them are ideal for all situations. It's more natural to consume asynchronous operations by Rx, rather than the other way around. IMO, the best way of approaching this is to use asynchronous operations in your libraries and lower-level code as much as possible, and if you need Rx at some point, then use Rx from there on up.
Here is probably a good example of how not to use the new async feature (that's not writing a new RSS client or Twitter app), mid-method overload points in a virtual method call. To be honest, i am not sure there is any way to create more than a single overload point per method.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Threading;
namespace AsyncText
{
class Program
{
static void Main(string[] args)
{
Derived d = new Derived();
TaskEx.Run(() => d.DoStuff()).Wait();
System.Console.Read();
}
public class Base
{
protected string SomeData { get; set; }
protected async Task DeferProcessing()
{
await TaskEx.Run(() => Thread.Sleep(1) );
return;
}
public async virtual Task DoStuff() {
Console.WriteLine("Begin Base");
Console.WriteLine(SomeData);
await DeferProcessing();
Console.WriteLine("End Base");
Console.WriteLine(SomeData);
}
}
public class Derived : Base
{
public async override Task DoStuff()
{
Console.WriteLine("Begin Derived");
SomeData = "Hello";
var x = base.DoStuff();
SomeData = "World";
Console.WriteLine("Mid 1 Derived");
await x;
Console.WriteLine("EndDerived");
}
}
}
}
Output Is:
Begin Derived
Begin Base
Hello
Mid 1 Derived
End Base
World
EndDerived
With certain inheritance hierarchies (namely using command pattern) i find myself wanting to do stuff like this occasionally.
here is an article about showing how to use the 'async' syntax in a non-networked scenario that involves UI and multiple actions.