Throttling with SemaphoreSlim -- "Task.Run()" vs "new Func<Task>()" - c#

This might not be specific to SemaphoreSlim exclusively, but basically my question is about whether there is a difference between the below two methods of throttling a collection of long running tasks, and if so, what that difference is (and when if ever to use either).
In the example below, let's say that each tracked task involves loading data from a Url (totally made up example, but is a common one that I've found for SemaphoreSlim examples).
The main difference comes down to how the individual tasks are added to the list of tracked tasks. In the first example, we call Task.Run() with a lambda, whereas in the second, we new up a Func(<Task<Result>>()) with a lambda and then immediately call that func and add the result to the tracked task list.
Examples:
Using Task.Run():
SemaphoreSlim ss = new SemaphoreSlim(_concurrentTasks);
List<string> urls = ImportUrlsFromSource();
List<Task<Result>> trackedTasks = new List<Task<Result>>();
foreach (var item in urls)
{
await ss.WaitAsync().ConfigureAwait(false);
trackedTasks.Add(Task.Run(async () =>
{
try
{
return await ProcessUrl(item);
}
catch (Exception e)
{
_log.Error($"logging some stuff");
throw;
}
finally
{
ss.Release();
}
}));
}
var results = await Task.WhenAll(trackedTasks);
Using a new Func:
SemaphoreSlim ss = new SemaphoreSlim(_concurrentTasks);
List<string> urls = ImportUrlsFromSource();
List<Task<Result>> trackedTasks = new List<Task<Result>>();
foreach (var item in urls)
{
trackedTasks.Add(new Func<Task<Result>>(async () =>
{
await ss.WaitAsync().ConfigureAwait(false);
try
{
return await ProcessUrl(item);
}
catch (Exception e)
{
_log.Error($"logging some stuff");
throw;
}
finally
{
ss.Release();
}
})());
}
var results = await Task.WhenAll(trackedTasks);

There are two differences:
Task.Run does error handling
First off all, when you call the lambda, it runs. On the other hand, Task.Run would call it. This is relevant because Task.Run does a bit of work behind the scenes. The main work it does is handling a faulted task...
If you call a lambda, and the lambda throws, it would throw before you add the Task to the list...
However, in your case, because your lambda is async, the compiler would create the Task for it (you are not making it by hand), and it will correctly handle the exception and make it available via the returned Task. Therefore this point is moot.
Task.Run prevents task attachment
Task.Run sets DenyChildAttach. This means that the tasks created inside the Task.Run run independently from (are not synchronized with) the returned Task.
For example, this code:
List<Task<int>> trackedTasks = new List<Task<int>>();
var numbers = new int[]{0, 1, 2, 3, 4};
foreach (var item in numbers)
{
trackedTasks.Add(Task.Run(async () =>
{
var x = 0;
(new Func<Task<int>>(async () =>{x = item; return x;}))().Wait();
Console.WriteLine(x);
return x;
}));
}
var results = await Task.WhenAll(trackedTasks);
Will output the numbers from 0 to 4, in unknown order. However the following code:
List<Task<int>> trackedTasks = new List<Task<int>>();
var numbers = new int[]{0, 1, 2, 3, 4};
foreach (var item in numbers)
{
trackedTasks.Add(new Func<Task<int>>(async () =>
{
var x = 0;
(new Func<Task<int>>(async () =>{x = item; return x;}))().Wait();
Console.WriteLine(x);
return x;
})());
}
var results = await Task.WhenAll(trackedTasks);
Will output the numbers from 0 to 4, in order, every time. This is odd, right? What happens is that the inner task is attached to outer one, and executed right away in the same thread. But if you use Task.Run, the inner task is not attached and scheduled independently.
This remain true even if you use await, as long as the task you await does not go to an external system...
What happens with external system? Well, for example, if your task is reading from an URL - as in your example - the system would create a TaskCompletionSource, get the Task from it, set a response handler that writes the result to the TaskCompletionSource, make the request, and return the Task. This Task is not scheduled, it running on the same thread as a parent task makes no sense. And thus, it can break the order.
Since, you are using await to wait on an external system, this point is moot too.
Conclusion
I must conclude that these are equivalent.
If you want to be safe, and make sure it works as expected, even if - in a future version - some of the above points stops being moot, then keep Task.Run. On the other hand, if you really want to optimize, use the lambda and avoid the Task.Run (very small) overhead. However, that probably won't be a bottleneck.
Addendum
When I talk about a task that goes to an external system, I refer to something that runs outside of .NET. There a bit of code that will run in .NET to interface with the external system, but the bulk of the code will not run in .NET, and thus will not be in a managed thread at all.
The consumer of the API specify nothing for this to happen. The task would be a promise task, but that is not exposed, for the consumer there is nothing special about it.
In fact, a task that goes to an external system may barely run in the CPU at all. Futhermore, it might just be waiting on something exterior to the computer (it could be the network or user input).
The pattern is as follows:
The library creates a TaskCompletionSource.
The library sets a means to recieve a notification. It can be a callback, event, message loop, hook, listening to a socket, a pipe line, waiting on a global mutex... whatever is necesary.
The library sets code to react to the notification that will call SetResult, or SetException on the TaskCompletionSource as appropiate for the notification recieved.
The library does the actual call to the external system.
The library returns TaskCompletionSource.Task.
Note: with extra care of optimization not reordering things where it should not, and with care of handling errors during the setup phase. Also, if a CancellationToken is involved, it has to be taken into account (and call SetCancelled on the TaskCompletionSource when appropiate). Also, there could be tear down necesary in the reaction to the notification (or on cancellation). Ah, do not forget to validate your parameters.
Then the external system goes and does whatever it does. Then when it finishes, or something goes wrong, gives the library the notification, and your Task is sudendtly completed, faulted... (or if cancellation happened, your Task is now cancelled) and .NET will schedule the continuations of the task as needed.
Note: async/await uses continuations behind the scenes, that is how execution resumes.
Incidentally, if you wanted to implement SempahoreSlim yourself, you would have to do something very similar to what I describe above. You can see it in my backport of SemaphoreSlim.
Let us see a couple of examples of promise tasks...
Task.Delay: when we are waiting with Task.Delay, the CPU is not spinning. This is not running in a thread. In this case the notification mechanism will be an OS timer. When the OS sees that the time of the timer has elapsed, it will call into the CLR, and then the CLR will mark the task as completed. What thread was waiting? none.
FileStream.ReadSync: when we are reading from storage with FileStream.ReadSync the actual work is done by the device. The CRL has to declare a custom event, then pass the event, the file handle and the buffer to the OS... the OS calls the device driver, the device driver interfaces with the device. As the storage device recovers the information, it will write to memory (directly on the specified buffer) via DMA technology. And when it is done, it will set an interruption, that is handled by the driver, that notifies the OS, that calls the custom event, that marks the task as completed. What thread did read the data from storage? none.
A similar pattern will be used to download from a web page, except, this time the device goes to the network. How to make an HTTP request and how the system waits for a response is beyond the scope of this answer.
It is also possible that the external system is another program, in which case it would run on a thread. But it won't be a managed thread on your process.
Your take away is that these task do not run on any of your threads. And their timing might depend on external factors. Thus, it makes no sense to think of them as running in the same thread, or that we can predict their timing (well, except of course, in the case of the timer).

Both are not very good because they create the tasks immediately. The func version is a little less overhead since it saves the Task.Run route over the thread pool just to immediately end the thread pool work and suspend on the semaphore. You don't need an async Func, you could simplify this by using an async method (possibly a local function).
But you should not do this at all. Instead, use a helper method that implements a parallel async foreach.
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate {
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
Then you just go urls.ForEachAsync(myDop, async input => await ProcessAsync(input));
Here, the tasks are created on demand. You can even make the input stream lazy.

Related

How to execute an arbitrary number of async tasks in sequential order?

I have this function:
async Task RefreshProfileInfo(List<string> listOfPlayers)
// For each player in the listOfPlayers, checks an in-memory cache if we have an entry.
// If we have a cached entry, do nothing.
// If we don't have a cached entry, fetch from backend via an API call.
This function is called very frequently, like:
await RefreshProfileInfo(playerA, playerB, playerC)
or
await RefreshProfileInfo(playerB, playerC, playerD)
or
await RefreshProfileInfo(playerE, playerF)
Ideally, if the players do not overlap each other, the calls should not affect each other (requesting PlayerE and PlayerF should not block the request for PlayerA, PlayerB, PlayerC). However, if the players DO overlap each other, the second call should wait for the first (requesting PlayerB, PlayerC, PlayerD, should wait for PlayerA, PlayerB, PlayerC to finish).
However, if that isn't possible, at the very least I'd like all calls to be sequential. (I think they should still be async, so they don't block other unrelated parts of the code).
Currently, what happens is each RefreshProfileInfo runs in parallel, which results in hitting backend every time (9 times in this example).
Instead, I want to execute them sequentially, so that only the first call hits the backend, and subsequent calls just hit cache.
What data structure/approach should I use? I'm having trouble figuring out how to "connect" the separate calls to each other. I've been playing around with Task.WhenAll() as well as SemaphoreSlim, but I can't figure out how to use them properly.
Failed attempt
The idea behind my failed attempt was to have a helper class where I could call a function, SequentialRequest(Task), and it would sequentially run all tasks invoked in this manner.
List<Task> waitingTasks = new List<Task>();
object _lock = new object();
public async Task SequentialRequest(Task func)
{
var waitingTasksCopy = new List<Task>();
lock (_lock)
{
waitingTasksCopy = new List<Task>(waitingTasks);
waitingTasks.Add(func); // Add this task to the waitingTasks (for future SequentialRequests)
}
// Wait for everything before this to finish
if (waitingTasksCopy.Count > 0)
{
await Task.WhenAll(waitingTasksCopy);
}
// Run this task
await func;
}
I thought this would work, but "func" is either run instantly (instead of waiting for earlier tasks to finish), or never run at all, depending on how I call it.
If I call it using this, it runs instantly:
async Task testTask()
{
await Task.Delay(4000);
}
If I call it using this, it never runs:
Task testTask = new Task(async () =>
{
await Task.Delay(4000);
});
Here's why your current attempt doesn't work:
// Run this task
await func;
The comment above is not describing what the code is doing. In the asynchronous world, a Task represents some operation that is already in progress. Tasks are not "run" by using await; await it a way for the current code to "asynchronously wait" for a task to complete. So no function signature taking a Task is going to work; the task is already in progress before it's even passed to that function.
Your question is actually about caching asynchronous operations. One way to do this is to cache the Task<T> itself. Currently, your cache holds the results (T); you can change your cache to hold the asynchronous operations that retrieve those results (Task<T>). For example, if your current cache type is ConcurrentDictionary<PlayerId, Player>, you could change it to ConcurrentDictionary<PlayerId, Task<Player>>.
With a cache of tasks, when your code checks for a cache entry, it will find an existing entry if the player data is loaded or has started loading. Because the Task<T> represents some asynchronous operation that is already in progress (or has already completed).
A couple of notes for this approach:
This only works for in-memory caches.
Think about how you want to handle errors. A naive cache of Task<T> will also cache error results, which is usually not desired.
The second point above is the trickier part. When an error happens, you'd probably want some additional logic to remove the errored task from the cache. Bonus points (and additional complexity) if the error handling code prevents an errored task from getting into the cache in the first place.
at the very least I'd like all calls to be sequential
Well, that's much easier. SemaphoreSlim is the asynchronous replacement for lock, so you can use a shared SemaphoreSlim. Call await mySemaphoreSlim.WaitAsync(); at the beginning of RefreshProfileInfo, put the body in a try, and in the finally block at the end of RefreshProfileInfo, call mySemaphoreSlim.Release();. That will limit all calls to RefreshProfileInfo to running sequentially.
I had the same issue in one of my projects. I had multiple threads call a single method and they all made IO calls when not found in cache. What you want to do is to add the Task to your cache and then await it. Subsequent calls will then just read the result once the task completes.
Example:
private Task RefreshProfile(Player player)
{
// cache is of type IMemoryCache
return _cache.GetOrCreate(player, entry =>
{
// expire in 30 seconds
entry.AbsoluteExpiration = DateTimeOffset.UtcNow.AddSeconds(30);
return ActualRefreshCodeThatReturnsTask(player);
});
}
Then just await in your calling code
await Task.WhenAll(RefreshProfile(Player a), RefreshProfile(Player b), RefreshProfile(Player c));

Running multipe Task<> in an enterprise application in a safe way

I'm designing the software architecture for a product who can instantiate a series of "agents" doing some useful things.
Let's say each agent implement an interface having a function:
Task AsyncRun(CancellationToken token)
Because since these agents are doing a lot of I/O it could make some sense having as an async function. More over, the AsyncRun is supposed never complete, if no exception or explict cancellation occour.
Now the question is: main program has to run this on multiple agents, I would like to know the correct way of running that multiple task, signal each single completion ( that are due to cancellation/errors ):
for example I'm thinking on something like having an infinite loop like this
//.... all task cretaed are in the array tasks..
while(true)
{
await Task.WhenAny(tasks)
//.... check each single task for understand which one(s) exited
// re-run the task if requested replacing in the array tasks
}
but not sure if it is the correct ( or even best way )
And moreover I would like to know if this is the correct pattern, especially because the implementer can mismatch the RunAsync and do a blocking call, in which case the entire application will hang.
// re-run the task if requested replacing in the array tasks
This is the first thing I'd consider changing. It's far better to not let an application handle its own "restarting". If an operation failed, then there's no guarantee that an application can recover. This is true for any kind of operation in any language/runtime.
A better solution is to let another application restart this one. Allow the exception to propagate (logging it if possible), and allow it to terminate the application. Then have your "manager" process (literally a separate executable process) restart as necessary. This is the way all modern high-availability systems work, from the Win32 services manager, to ASP.NET, to the Kubernetes container manager, to the Azure Functions runtime.
Note that if you do want to take this route, it may make sense to split up the tasks to different processes, so they can be restarted independently. That way a restart in one won't cause a restart in others.
However, if you want to keep all your tasks in the same process, then the solution you have is fine. If you have a known number of tasks at the beginning of the process, and that number won't change (unless they fail), then you can simplify the code a bit by factoring out the restarting and using Task.WhenAll instead of Task.WhenAny:
async Task RunAsync(Func<CancellationToken, Task> work, CancellationToken token)
{
while (true)
{
try { await work(token); }
catch
{
// log...
}
if (we-should-not-restart)
break;
}
}
List<Func<CancellationToken, Task>> workToDo = ...;
var tasks = workToDo.Select(work => RunAsync(work, token));
await Task.WhenAll(tasks);
// Only gets here if they all complete/fail and were not restarted.
the implementer can mismatch the RunAsync and do a blocking call, in which case the entire application will hang.
The best way to prevent this is to wrap the call in Task.Run, so this:
await work(token);
becomes this:
await Task.Run(() => work(token));
In order to know whether the task completes successfully, or is cancelled or faulted, you could use a continuation. The continuation will be invoked as soon as the task finishes, whether that's because of failure, cancellation or completion. :
using (var tokenSource = new CancellationTokenSource())
{
IEnumerable<IAgent> agents; // TODO: initialize
var tasks = new List<Task>();
foreach (var agent in agents)
{
var task = agent.RunAsync(tokenSource.Token)
.ContinueWith(t =>
{
if (t.IsCanceled)
{
// Do something if cancelled.
}
else if (t.IsFaulted)
{
// Do something if faulted (with t.Exception)
}
else
{
// Do something if the task has completed.
}
});
tasks.Add(task);
}
await Task.WhenAll(tasks);
}
In the end you will wait for the continued tasks. Also see this answer.
If you are afraid that the IAgent implementations will create blocking calls and want to prevent the application from hanging, you can wrap the call to the async method in Task.Run. This way the call to the agent is executed on the threadpool and is therefore non-blocking:
var task = Task.Run(async () =>
await agent.RunAsync(tokenSource.Token)
.ContinueWith(t =>
{
// Same as above
}));
You may want to use Task.Factory.StartNew instead to mark the task as longrunning for example.

EventHub ForEach Parallel Async

Always managing to confuse myself working with async, I'm after a bit of validation/confirmation here that i'm doing what i think i'm doing in the following scenarios..
given the following trivial example:
// pretend / assume these are json msgs or something ;)
var strEvents = new List<string> { "event1", "event2", "event3" };
i can post each event to an eventhub simply as follows:
foreach (var e in strEvents)
{
// Do some things
outEventHub.Add(e); // ICollector
}
the foreach will run on a single thread, and execute each thing inside sequentially.. the posting to eventhub will also remain on the same thread too i guess??
Changing ICollector to IAsyncCollector, and achieve the following:
foreach (var e in strEvents)
{
// Do some things
await outEventHub.AddAsync(e);
}
I think i am right here in saying that the foreach will run on a single thread, the actual sending to the event hub will be pushed off elsewhere? Or at least not block that same thread..
Changing to Parallel.ForEach event as these events will be arriving 100+ or so at a time:
Parallel.ForEach(events, async (e) =>
{
// Do some things
await outEventHub.AddAsync(e);
});
Starting to get a bit hazy now, as i am not sure what really is going on now... afaik the each event has it's own thread (within the bounds of the hardware) and steps within that thread do not block it.. so this trivial example aside.
Finally, i could turn them all in to Tasks i thought..
private static async Task DoThingAsync(string e, IAsyncCollector<string> outEventHub)
{
await outEventHub.AddAsync(e);
}
var t = new List<Task>();
foreach (var e in strEvents)
{
t.Add(DoThingAsync(e, outEventHub));
}
await Task.WhenAll(t);
now i am really hazy, and i think this is prepping everything on a single thread.. and then running everything exactly at the same time, on any thread available??
I appreciate that in order to determine which is right for the job at hand benchmarking is required... but an explanation of what the framework is doing in each situation would be super helpful for me right now..
Parallel != async
This is the main idea here. Both of them have their uses, and they can be used together, but they are very different. You are mostly right with your assumptions, but let me clarify:
Simple foreach
This is non-parallel and non-async. Nothing to talk about.
Await inside foreach
This is async code that is non-parallel.
foreach (var e in strEvents)
{
// Do some things
await outEventHub.AddAsync(e);
}
This will all take place on a single thread. It takes an event, starts adding it to your event hub, and while it is being completed (I'm guessing it does some sort of network IO) it hands back the thread to the thread pool (or UI if it was called on a UI thread) so it can do other work while wating on AddAsync to return. But as you said, is is not parallel at all.
Parallel Foreach (async)
This one is a trap! In short, Parallel.Foreach is designed for synchronous workloads. We'll get back to this but first let's assume you used it with the non-async code.
Parallel foreach (sync)
A.k.a. Parallel but not async.
Parallel.ForEach(events, (e) =>
{
// Do some things
outEventHub.Add(e);
});
Each item will get its own "Task", but they won't spawn a thread. Creating threads is expensive, and in an optimal case there is no point in having more threads than CPU cores. Instead these tasks run on a ThreadPool, which has just as many Threads as optimal. Each thread takes a task, works on it, then takes another one, etc.
You can think of it as - on a 4 core machine - having 4 workers around a pile of tasks, so 4 of them are being run at a time. You can imagine that this is not ideal in case of IO bound workloads (which this most likely is). If your network is slow, you can have all 4 threads blocked on trying to send the event out, while they could be doing useful work. This leads us to...
Tasks
Async and potentially parallel (depends on the usage).
Your description is correct here, too, except for the ThreadPool, it is kikking off all the tasks at once (on the main thread), which then run on the pool's threads. While they are running, the main thread is released, which then can do other work, as needed. Up to this point it is the same as the Parallel.Foreach case. But:
What happens is that a TaskPool thread picks up a task, does the necessary preprocessing, then sends out the network request asynchronously. This means that this task will not block while waiting for the network, but rather it releases the ThreadPool thread to pick up another workitem. When the network request completes, the tasks continuation (the remaining code lines after the network request) is scheduled back to the list of tasks.
You can see that theoretically this is the most efficient process, so fast that you have to be careful not to flood your network.
Back to Parallel.Foreach and async
At this point you should be able to spot the problem. All your async lambda async (e) => { await outEventHub.AddAsync(e);} is doing is to kick off the work, it will return right after it hits the await. (Remember that async/await is releasing threads while waiting.) Parallel.Foreach returns right after it started all of them. But nothing is awaiting these tasks! These become fire and forget, which is usually a bad practice. It is like you deleted the await Task.WhenAll call from your task example.
I hope this cleared most things for you, if not, let me know what to improve on.
Why don't you send those events asynchronously in parallel, like this:
var tasks = new List<Task>();
foreach( var e in strEvents )
{
tasks.Add(outEventHub.AddAsync(e));
}
await Task.WhenAll(tasks);
await outEventHub.FlushAsync();

Why an additional async operation is making my code faster than when the operation is not taking place at all?

I'm working on a SMS-based game (Value Added Service), in which a question must be sent to each subscriber on a daily basis. There are over 500,000 subscribers and therefore performance is a key factor. Since each subscriber can be a difference state of the competition with different variables, database must be queried separately for each subscriber before sending a text message. To achieve the best performance I'm using .Net Task Parallel Library (TPL) to spawn parallel threadpool threads and do as much async operations as possible in each thread to finally send texts asap.
Before describing the actual problem there are some more information necessary to give about the code.
At first there was no async operation in the code. I just scheduled some 500,000 tasks with the default task scheduler into the Threadpool and each task would work through the routines, blocking on all EF (Entity Framework) queries and sequentially finishing its job. It was good, but not fast enough. Then I changed all EF queries to Async, the outcome was superb in speed but there has been so many deadlocks and timeouts in SQL server that about a third of the subscribers never received a text! After trying different solutions, I decided not to do too many Async Database operations while I have over 500,000 tasks running on a 24 core server (with at least 24 concurrent threadpool threads)!
I rolled back all the changes (the Asycn ones) expect for one web service call in each task which remained Async.
Now the weird case:
In my code, I have a boolean variable named "isCrossSellActive". When the variable is set some more DB operations take place and an asycn webservice call will happen on which the thread awaits. When this variable is false, none of these operations will happen including the async webservice call. Awkwardly when the variable is set the code runs so much faster than when it's not! It seems like for some reason the awaited async code (the cooperative thread) is making the code faster.
Here is the code:
public async Task AutoSendMessages(...)
{
//Get list of subscriptions plus some initialization
LimitedConcurrencyLevelTaskScheduler lcts = new LimitedConcurrencyLevelTaskScheduler(numberOfThreads);
TaskFactory taskFactory = new TaskFactory(lcts);
List<Task> tasks = new List<Task>();
//....
foreach (var sub in subscriptions)
{
AutoSendData data = new AutoSendData
{
ServiceId = serviceId,
MSISDN = sub.subscriber,
IsCrossSellActive = bolCrossSellHeader
};
tasks.Add(await taskFactory.StartNew(async (x) =>
{
await SendQuestion(x);
}, data));
}
GC.Collect();
try
{
Task.WaitAll(tasks.ToArray());
}
catch (AggregateException ae)
{
ae.Handle((ex) =>
{
_logRepo.LogException(1, "", ex);
return true;
});
}
await _autoSendRepo.SetAutoSendingStatusEnd(statusId);
}
public async Task SendQuestion(object data)
{
//extract variables from input parameter
try
{
if (isCrossSellActive)
{
int pieceCount = subscriptionRepo.GetSubscriberCarPieces(curSubscription.service, curSubscription.subscriber).Count(c => c.isConfirmed);
foreach (var rule in csRules)
{
if (rule.Applies)
{
if (await HttpClientHelper.GetJsonAsync<bool>(url, rule.TargetServiceBaseAddress))
{
int noOfAddedPieces = SomeCalculations();
if (noOfAddedPieces > 0)
{
crossSellRepo.SetPromissedPieces(curSubscription.subscriber, curSubscription.service,
rule.TargetShortCode, noOfAddedPieces, 0, rule.ExpirationLimitDays);
}
}
}
}
}
// The rest of the code. (Some db CRUD)
await SmsClient.SendSoapMessage(subscriber, smsBody);
}
catch (Exception ex){//...}
}
Ok, thanks to #usr and the clue he gave me, the problem is finally solved!
His comment drew my attention to the awaited taskFactory.StartNew(...) line which sequentially adds new tasks to the "tasks" list which is then awaited on by Task.WaitAll(tasks);
At first I removed the await keyword before the taskFactory.StartNew() and it led the code towards a horrible state of malfunction! I then returned the await keyword to before taskFactory.StartNew() and debugged the code using breakpoints and amazingly saw that the threads are ran one after another and sequentially before the first thread reaches the first await inside the "SendQuestion" routine. When the "isCrossSellActive" flag was set despite the more jobs a thread should do the first await keyword is reached earlier thus enabling the next scheduled task to run. But when its not set the only await keyword is the last line of the routine so its most likely to run sequentially to the end.
usr's suggestion to remove the await keyword in the for loop seemed to be correct but the problem was the Task.WaitAll() line would wait on the wrong list of Task<Task<void>> instead of Task<void>. I finally used Task.Run instead of TaskFactory.StartNew and everything changed. Now the service is working well. The final code inside the for loop is:
tasks.Add(Task.Run(async () =>
{
await SendQuestion(data);
}));
and the problem was solved.
Thank you all.
P.S. Read this article on Task.Run and why TaskFactory.StartNew is dangerous: http://blog.stephencleary.com/2013/08/startnew-is-dangerous.html
It's extremly hard to tell unless you add some profiling that tell you which code is taking longer now.
Without seeing more numbers my best guess would be that the SMS service doesn't like when you send too many requests in a short time and chokes. When you add the extra DB calls the extra delay make the sms service work better.
A few other small details:
await Task.WhenAll is usually a bit better than Task.WaitAll. WaitAll means the thread will sit around waiting. Making a deadlock slightly more likely.
Instead of:
tasks.Add(await taskFactory.StartNew(async (x) =>
{
await SendQuestion(x);
}, data));
You should be able to do
tasks.Add(SendQuestion(data));

Why is this code running synchronously?

I am trying to understand concurrency by doing it in code. I have a code snippet which I thought was running asynchronously. But when I put the debug writeline statements in, I found that it is running synchronously. Can someone explain what I need to do differently to push ComputeBB() onto another thread using Task.Something?
Clarification I want this code to run ComputeBB in some other thread so that the main thread will keep on running without blocking.
Here is the code:
{
// part of the calling method
Debug.WriteLine("About to call ComputeBB");
returnDTM.myBoundingBox = await Task.Run(() => returnDTM.ComputeBB());
Debug.WriteLine("Just called await ComputBB.");
return returnDTM;
}
private ptsBoundingBox2d ComputeBB()
{
Debug.WriteLine("Starting ComputeBB.");
Stopwatch sw = new Stopwatch(); sw.Start();
var point1 = this.allPoints.FirstOrDefault().Value;
var returnBB = new ptsBoundingBox2d(
point1.x, point1.y, point1.z, point1.x, point1.y, point1.z);
Parallel.ForEach(this.allPoints,
p => returnBB.expandByPoint(p.Value.x, p.Value.y, p.Value.z)
);
sw.Stop();
Debug.WriteLine(String.Format("Compute BB took {0}", sw.Elapsed));
return returnBB;
}
Here is the output in the immediate window:
About to call ComputeBB
Starting ComputeBB.
Compute BB took 00:00:00.1790574
Just called await ComputBB.
Clarification If it were really running asynchronously it would be in this order:
About to call ComputeBB
Just called await ComputBB.
Starting ComputeBB.
Compute BB took 00:00:00.1790574
But it is not.
Elaboration
The calling code has signature like so: private static async Task loadAsBinaryAsync(string fileName) At the next level up, though, I attempt to stop using async. So here is the call stack from top to bottom:
static void Main(string[] args)
{
aTinFile = ptsDTM.CreateFromExistingFile("TestSave.ptsTin");
// more stuff
}
public static ptsDTM CreateFromExistingFile(string fileName)
{
ptsDTM returnTin = new ptsDTM();
Task<ptsDTM> tsk = Task.Run(() => loadAsBinaryAsync(fileName));
returnTin = tsk.Result; // I suspect the problem is here.
return retunTin;
}
private static async Task<ptsDTM> loadAsBinaryAsync(string fileName)
{
// do a lot of processing
Debug.WriteLine("About to call ComputeBB");
returnDTM.myBoundingBox = await Task.Run(() => returnDTM.ComputeBB());
Debug.WriteLine("Just called await ComputBB.");
return returnDTM;
}
I have a code snippet which I thought was running asynchronously. But when I put the debug writeline statements in, I found that it is running synchronously.
await is used to asynchronously wait an operations completion. While doing so, it yields control back to the calling method until it's completion.
what I need to do differently to push ComputeBB() onto another thread
It is already ran on a thread pool thread. If you don't want to asynchronously wait on it in a "fire and forget" fashion, don't await the expression. Note this will have an effect on exception handling. Any exception which occurs inside the provided delegate would be captured inside the given Task, if you don't await, there is a chance they will go about unhandled.
Edit:
Lets look at this piece of code:
public static ptsDTM CreateFromExistingFile(string fileName)
{
ptsDTM returnTin = new ptsDTM();
Task<ptsDTM> tsk = Task.Run(() => loadAsBinaryAsync(fileName));
returnTin = tsk.Result; // I suspect the problem is here.
return retunTin;
}
What you're currently doing is synchronously blocking when you use tsk.Result. Also, for some reason you're calling Task.Run twice, once in each method. That is unnecessary. If you want to return your ptsDTM instance from CreateFromExistingFile, you will have to await it, there is no getting around that. "Fire and Forget" execution doesn't care about the result, at all. It simply wants to start whichever operation it needs, if it fails or succeeds is usually a non-concern. That is clearly not the case here.
You'll need to do something like this:
private PtsDtm LoadAsBinary(string fileName)
{
Debug.WriteLine("About to call ComputeBB");
returnDTM.myBoundingBox = returnDTM.ComputeBB();
Debug.WriteLine("Just called ComputeBB.");
return returnDTM;
}
And then somewhere up higher up the call stack, you don't actually need CreateFromExistingFiles, simply call:
Task.Run(() => LoadAsBinary(fileName));
When needed.
Also, please, read the C# naming conventions, which you're currently not following.
await's whole purpose is in adding the synchronicity back in asynchronous code. This allows you to easily partition the parts that are happenning synchronously and asynchronously. Your example is absurd in that it never takes any advantage whatsoever of this - if you just called the method directly instead of wrapping it in Task.Run and awaiting that, you would have had the exact same result (with less overhead).
Consider this, though:
await
Task.WhenAll
(
loadAsBinaryAsync(fileName1),
loadAsBinaryAsync(fileName2),
loadAsBinaryAsync(fileName3)
);
Again, you have the synchronicity back (await functions as the synchronization barrier), but you've actually performed three independent operations asynchronously with respect to each other.
Now, there's no reason to do something like this in your code, since you're using Parallel.ForEach at the bottom level - you're already using the CPU to the max (with unnecessary overhead, but let's ignore that for now).
So the basic usage of await is actually to handle asynchronous I/O rather than CPU work - apart from simplifying code that relies on some parts of CPU work being synchronised and some not (e.g. you have four threads of execution that simultaneously process different parts of the problem, but at some point have to be reunited to make sense of the individual parts - look at the Barrier class, for example). This includes stuff like "making sure the UI doesn't block while some CPU intensive operation happens in the background" - this makes the CPU work asynchronous with respect to the UI. But at some point, you still want to reintroduce the synchronicity, to make sure you can display the results of the work on the UI.
Consider this winforms code snippet:
async void btnDoStuff_Click(object sender, EventArgs e)
{
lblProgress.Text = "Calculating...";
var result = await DoTheUltraHardStuff();
lblProgress.Text = "Done! The result is " + result;
}
(note that the method is async void, not async Task nor async Task<T>)
What happens is that (on the GUI thread) the label is first assigned the text Calculating..., then the asynchronous DoTheUltraHardStuff method is scheduled, and then, the method returns. Immediately. This allows the GUI thread to do whatever it needs to do. However - as soon as the asynchronous task is complete and the GUI is free to handle the callback, the execution of btnDoStuff_Click will continue with the result already given (or an exception thrown, of course), back on the GUI thread, allowing you to set the label to the new text including the result of the asynchronous operation.
Asynchronicity is not an absolute property - stuff is asynchronous to some other stuff, and synchronous to some other stuff. It only makes sense with respect to some other stuff.
Hopefully, now you can go back to your original code and understand the part you've misunderstood before. The solutions are multiple, of course, but they depend a lot on how and why you're trying to do what you're trying to do. I suspect you don't actually need to use Task.Run or await at all - the Parallel.ForEach already tries to distribute the CPU work over multiple CPU cores, and the only thing you could do is to make sure other code doesn't have to wait for that work to finish - which would make a lot of sense in a GUI application, but I don't see how it would be useful in a console application with the singular purpose of calculating that single thing.
So yes, you can actually use await for fire-and-forget code - but only as part of code that doesn't prevent the code you want to continue from executing. For example, you could have code like this:
Task<string> result = SomeHardWorkAsync();
Debug.WriteLine("After calling SomeHardWorkAsync");
DoSomeOtherWorkInTheMeantime();
Debug.WriteLine("Done other work.");
Debug.WriteLine("Got result: " + (await result));
This allows SomeHardWorkAsync to execute asynchronously with respect to DoSomeOtherWorkInTheMeantime but not with respect to await result. And of course, you can use awaits in SomeHardWorkAsync without trashing the asynchronicity between SomeHardWorkAsync and DoSomeOtherWorkInTheMeantime.
The GUI example I've shown way above just takes advantage of handling the continuation as something that happens after the task completes, while ignoring the Task created in the async method (there really isn't much of a difference between using async void and async Task when you ignore the result). So for example, to fire-and-forget your method, you could use code like this:
async void Fire(string filename)
{
var result = await ProcessFileAsync(filename);
DoStuffWithResult(result);
}
Fire("MyFile");
This will cause DoStuffWithResult to execute as soon as result is ready, while the method Fire itself will return immediately after executing ProcessFileAsync (up to the first await or any explicit return someTask).
This pattern is usually frowned upon - there really isn't any reason to return void out of an async method (apart from event handlers); you could just as easily return Task (or even Task<T> depending on the scenario), and let the caller decide whether he wants his code to execute synchronously in respect to yours or not.
Again,
async Task FireAsync(string filename)
{
var result = await ProcessFileAsync(filename);
DoStuffWithResult(result);
}
Fire("MyFile");
does the same thing as using async void, except that the caller can decide what to do with the asynchronous task. Perhaps he wants to launch two of those in parallel and continue after all are done? He can just await Task.WhenAll(Fire("1"), Fire("2")). Or he just wants that stuff to happen completely asynchronously with respect to his code, so he'll just call Fire("1") and ignore the resulting Task (of course, ideally, you at the very least want to handle possible exceptions).

Categories