Always managing to confuse myself working with async, I'm after a bit of validation/confirmation here that i'm doing what i think i'm doing in the following scenarios..
given the following trivial example:
// pretend / assume these are json msgs or something ;)
var strEvents = new List<string> { "event1", "event2", "event3" };
i can post each event to an eventhub simply as follows:
foreach (var e in strEvents)
{
// Do some things
outEventHub.Add(e); // ICollector
}
the foreach will run on a single thread, and execute each thing inside sequentially.. the posting to eventhub will also remain on the same thread too i guess??
Changing ICollector to IAsyncCollector, and achieve the following:
foreach (var e in strEvents)
{
// Do some things
await outEventHub.AddAsync(e);
}
I think i am right here in saying that the foreach will run on a single thread, the actual sending to the event hub will be pushed off elsewhere? Or at least not block that same thread..
Changing to Parallel.ForEach event as these events will be arriving 100+ or so at a time:
Parallel.ForEach(events, async (e) =>
{
// Do some things
await outEventHub.AddAsync(e);
});
Starting to get a bit hazy now, as i am not sure what really is going on now... afaik the each event has it's own thread (within the bounds of the hardware) and steps within that thread do not block it.. so this trivial example aside.
Finally, i could turn them all in to Tasks i thought..
private static async Task DoThingAsync(string e, IAsyncCollector<string> outEventHub)
{
await outEventHub.AddAsync(e);
}
var t = new List<Task>();
foreach (var e in strEvents)
{
t.Add(DoThingAsync(e, outEventHub));
}
await Task.WhenAll(t);
now i am really hazy, and i think this is prepping everything on a single thread.. and then running everything exactly at the same time, on any thread available??
I appreciate that in order to determine which is right for the job at hand benchmarking is required... but an explanation of what the framework is doing in each situation would be super helpful for me right now..
Parallel != async
This is the main idea here. Both of them have their uses, and they can be used together, but they are very different. You are mostly right with your assumptions, but let me clarify:
Simple foreach
This is non-parallel and non-async. Nothing to talk about.
Await inside foreach
This is async code that is non-parallel.
foreach (var e in strEvents)
{
// Do some things
await outEventHub.AddAsync(e);
}
This will all take place on a single thread. It takes an event, starts adding it to your event hub, and while it is being completed (I'm guessing it does some sort of network IO) it hands back the thread to the thread pool (or UI if it was called on a UI thread) so it can do other work while wating on AddAsync to return. But as you said, is is not parallel at all.
Parallel Foreach (async)
This one is a trap! In short, Parallel.Foreach is designed for synchronous workloads. We'll get back to this but first let's assume you used it with the non-async code.
Parallel foreach (sync)
A.k.a. Parallel but not async.
Parallel.ForEach(events, (e) =>
{
// Do some things
outEventHub.Add(e);
});
Each item will get its own "Task", but they won't spawn a thread. Creating threads is expensive, and in an optimal case there is no point in having more threads than CPU cores. Instead these tasks run on a ThreadPool, which has just as many Threads as optimal. Each thread takes a task, works on it, then takes another one, etc.
You can think of it as - on a 4 core machine - having 4 workers around a pile of tasks, so 4 of them are being run at a time. You can imagine that this is not ideal in case of IO bound workloads (which this most likely is). If your network is slow, you can have all 4 threads blocked on trying to send the event out, while they could be doing useful work. This leads us to...
Tasks
Async and potentially parallel (depends on the usage).
Your description is correct here, too, except for the ThreadPool, it is kikking off all the tasks at once (on the main thread), which then run on the pool's threads. While they are running, the main thread is released, which then can do other work, as needed. Up to this point it is the same as the Parallel.Foreach case. But:
What happens is that a TaskPool thread picks up a task, does the necessary preprocessing, then sends out the network request asynchronously. This means that this task will not block while waiting for the network, but rather it releases the ThreadPool thread to pick up another workitem. When the network request completes, the tasks continuation (the remaining code lines after the network request) is scheduled back to the list of tasks.
You can see that theoretically this is the most efficient process, so fast that you have to be careful not to flood your network.
Back to Parallel.Foreach and async
At this point you should be able to spot the problem. All your async lambda async (e) => { await outEventHub.AddAsync(e);} is doing is to kick off the work, it will return right after it hits the await. (Remember that async/await is releasing threads while waiting.) Parallel.Foreach returns right after it started all of them. But nothing is awaiting these tasks! These become fire and forget, which is usually a bad practice. It is like you deleted the await Task.WhenAll call from your task example.
I hope this cleared most things for you, if not, let me know what to improve on.
Why don't you send those events asynchronously in parallel, like this:
var tasks = new List<Task>();
foreach( var e in strEvents )
{
tasks.Add(outEventHub.AddAsync(e));
}
await Task.WhenAll(tasks);
await outEventHub.FlushAsync();
Related
I trying to log data into Cassandra using c#. So my aim is to log as much data points as I can in 200ms.
I am trying to save time, random key and value in 200ms. Please see code for refrence. the problem how can I execute session after while loop.
Cluster cluster = Cluster.Builder()
.AddContactPoint("127.0.0.1")
.Build();
ISession session = cluster.Connect("log"); //keyspace to connect with
var ps = session.Prepare("Insert into logcassandra(nanodate, key, value) values (?,?,?)");
stopwatch.Start();
while(stop.ElapsedMilliseconds <= 200)
{
i++;
var statement = ps.Bind(nanoTime(),"key"+i,"value"+i);
session.ExecuteAsync(statement);
}
Please prefer System.Threading.Timer with a TimerCallback over Stopwatch.
EDIT: (reply to the comment)
Hi, I'm not sure what you want to achieve, but here are some general concepts about async calls and parallel execution. In .NET world the async is mainly used for Non-blocking I/O operations, which means your caller thread will not wait for the response of the I/O driver. In other words, you instantiate an I/O operation and dispatch this work to a "thing" which is outside of the .NET ecosystem and that will gives you back a future (a Task). The driver acknowledges back that it received the request and it promises that it will process it once it has free capacity.
That Task represents an async work that either succeeded or fail. But because you are calling it asynchronously you are not awaiting its result (not blocking the caller thread to wait for external work) rather move on to the next statement. Eventually this operation will be finished and at that time the driver will notify that Task that a request operation has been finished. (The Task can be seen as the primary communication channel between the caller and the callee)
In your case you are using a fire and forget style async call. That means you are firing off a lot of I/O operations in async and you forget to process the result of them. You don't know either any of them failed or not. But you have called the Casandra to do a lot of staff. Your time measurement is used only for firing off jobs, which means you have no idea how much of these jobs has been finished.
If you would choose to use await against your async calls, that would mean that your while loop would be serially executed. You would firing off a job and you can't move on to the next iteration because you are awaiting it, so your caller thread will move one level higher in its call stack and examines if it can processed with something. If there is an await as well, then it moves one level higher and so on...
while(stop.ElapsedMilliseconds <= 200)
{
await session.ExecuteAsync(statement);
}
If you don't want serial execution rather parallel, you can create as many jobs as you need and await them as a whole. That's where Task.WhenAll comes into the play. You will fire off a lot of jobs and you will await that single job that will track all of other jobs.
var cassandraCalls = new List<Task>();
cassandraCalls.AddRange(Enumerable.Range(0, 100).Select(_ => session.ExecuteAsync(statement)));
await Task.WhenAll(cassandraCalls);
But this code will run until all of the jobs are finished. If you want to constrain the whole execution time then you should use some cancellation mechanism. Task.WhenAll does not support CancellationToken. But you can overcome of this limitation in several way. The simplest solution is a combination of the Task.Delay and the Task.WhenAny. Task.Delay will be used for the timeout, and Task.WhenAny will be used to await either the your cassandra calls or the timeout to complete.
var cassandraCalls = new List<Task>();
cassandraCalls.AddRange(Enumerable.Range(0, 100).Select(_ => ExecuteAsync()));
await Task.WhenAny(Task.WhenAll(cassandraCalls), Task.Delay(1000));
In this way, you have fired off as many jobs as you wanted and depending on your driver they may be executed in parallel or concurrently. You are awaiting either to finish all or elapse a certain amount of time. When the WhenAny job finishes then you can examine the result of the jobs, but simply iterating over the cassandraCalls
foreach (var call in cassandraCalls)
{
Console.WriteLine(call.IsCompleted);
}
I hope this explanation helped you a bit.
This might not be specific to SemaphoreSlim exclusively, but basically my question is about whether there is a difference between the below two methods of throttling a collection of long running tasks, and if so, what that difference is (and when if ever to use either).
In the example below, let's say that each tracked task involves loading data from a Url (totally made up example, but is a common one that I've found for SemaphoreSlim examples).
The main difference comes down to how the individual tasks are added to the list of tracked tasks. In the first example, we call Task.Run() with a lambda, whereas in the second, we new up a Func(<Task<Result>>()) with a lambda and then immediately call that func and add the result to the tracked task list.
Examples:
Using Task.Run():
SemaphoreSlim ss = new SemaphoreSlim(_concurrentTasks);
List<string> urls = ImportUrlsFromSource();
List<Task<Result>> trackedTasks = new List<Task<Result>>();
foreach (var item in urls)
{
await ss.WaitAsync().ConfigureAwait(false);
trackedTasks.Add(Task.Run(async () =>
{
try
{
return await ProcessUrl(item);
}
catch (Exception e)
{
_log.Error($"logging some stuff");
throw;
}
finally
{
ss.Release();
}
}));
}
var results = await Task.WhenAll(trackedTasks);
Using a new Func:
SemaphoreSlim ss = new SemaphoreSlim(_concurrentTasks);
List<string> urls = ImportUrlsFromSource();
List<Task<Result>> trackedTasks = new List<Task<Result>>();
foreach (var item in urls)
{
trackedTasks.Add(new Func<Task<Result>>(async () =>
{
await ss.WaitAsync().ConfigureAwait(false);
try
{
return await ProcessUrl(item);
}
catch (Exception e)
{
_log.Error($"logging some stuff");
throw;
}
finally
{
ss.Release();
}
})());
}
var results = await Task.WhenAll(trackedTasks);
There are two differences:
Task.Run does error handling
First off all, when you call the lambda, it runs. On the other hand, Task.Run would call it. This is relevant because Task.Run does a bit of work behind the scenes. The main work it does is handling a faulted task...
If you call a lambda, and the lambda throws, it would throw before you add the Task to the list...
However, in your case, because your lambda is async, the compiler would create the Task for it (you are not making it by hand), and it will correctly handle the exception and make it available via the returned Task. Therefore this point is moot.
Task.Run prevents task attachment
Task.Run sets DenyChildAttach. This means that the tasks created inside the Task.Run run independently from (are not synchronized with) the returned Task.
For example, this code:
List<Task<int>> trackedTasks = new List<Task<int>>();
var numbers = new int[]{0, 1, 2, 3, 4};
foreach (var item in numbers)
{
trackedTasks.Add(Task.Run(async () =>
{
var x = 0;
(new Func<Task<int>>(async () =>{x = item; return x;}))().Wait();
Console.WriteLine(x);
return x;
}));
}
var results = await Task.WhenAll(trackedTasks);
Will output the numbers from 0 to 4, in unknown order. However the following code:
List<Task<int>> trackedTasks = new List<Task<int>>();
var numbers = new int[]{0, 1, 2, 3, 4};
foreach (var item in numbers)
{
trackedTasks.Add(new Func<Task<int>>(async () =>
{
var x = 0;
(new Func<Task<int>>(async () =>{x = item; return x;}))().Wait();
Console.WriteLine(x);
return x;
})());
}
var results = await Task.WhenAll(trackedTasks);
Will output the numbers from 0 to 4, in order, every time. This is odd, right? What happens is that the inner task is attached to outer one, and executed right away in the same thread. But if you use Task.Run, the inner task is not attached and scheduled independently.
This remain true even if you use await, as long as the task you await does not go to an external system...
What happens with external system? Well, for example, if your task is reading from an URL - as in your example - the system would create a TaskCompletionSource, get the Task from it, set a response handler that writes the result to the TaskCompletionSource, make the request, and return the Task. This Task is not scheduled, it running on the same thread as a parent task makes no sense. And thus, it can break the order.
Since, you are using await to wait on an external system, this point is moot too.
Conclusion
I must conclude that these are equivalent.
If you want to be safe, and make sure it works as expected, even if - in a future version - some of the above points stops being moot, then keep Task.Run. On the other hand, if you really want to optimize, use the lambda and avoid the Task.Run (very small) overhead. However, that probably won't be a bottleneck.
Addendum
When I talk about a task that goes to an external system, I refer to something that runs outside of .NET. There a bit of code that will run in .NET to interface with the external system, but the bulk of the code will not run in .NET, and thus will not be in a managed thread at all.
The consumer of the API specify nothing for this to happen. The task would be a promise task, but that is not exposed, for the consumer there is nothing special about it.
In fact, a task that goes to an external system may barely run in the CPU at all. Futhermore, it might just be waiting on something exterior to the computer (it could be the network or user input).
The pattern is as follows:
The library creates a TaskCompletionSource.
The library sets a means to recieve a notification. It can be a callback, event, message loop, hook, listening to a socket, a pipe line, waiting on a global mutex... whatever is necesary.
The library sets code to react to the notification that will call SetResult, or SetException on the TaskCompletionSource as appropiate for the notification recieved.
The library does the actual call to the external system.
The library returns TaskCompletionSource.Task.
Note: with extra care of optimization not reordering things where it should not, and with care of handling errors during the setup phase. Also, if a CancellationToken is involved, it has to be taken into account (and call SetCancelled on the TaskCompletionSource when appropiate). Also, there could be tear down necesary in the reaction to the notification (or on cancellation). Ah, do not forget to validate your parameters.
Then the external system goes and does whatever it does. Then when it finishes, or something goes wrong, gives the library the notification, and your Task is sudendtly completed, faulted... (or if cancellation happened, your Task is now cancelled) and .NET will schedule the continuations of the task as needed.
Note: async/await uses continuations behind the scenes, that is how execution resumes.
Incidentally, if you wanted to implement SempahoreSlim yourself, you would have to do something very similar to what I describe above. You can see it in my backport of SemaphoreSlim.
Let us see a couple of examples of promise tasks...
Task.Delay: when we are waiting with Task.Delay, the CPU is not spinning. This is not running in a thread. In this case the notification mechanism will be an OS timer. When the OS sees that the time of the timer has elapsed, it will call into the CLR, and then the CLR will mark the task as completed. What thread was waiting? none.
FileStream.ReadSync: when we are reading from storage with FileStream.ReadSync the actual work is done by the device. The CRL has to declare a custom event, then pass the event, the file handle and the buffer to the OS... the OS calls the device driver, the device driver interfaces with the device. As the storage device recovers the information, it will write to memory (directly on the specified buffer) via DMA technology. And when it is done, it will set an interruption, that is handled by the driver, that notifies the OS, that calls the custom event, that marks the task as completed. What thread did read the data from storage? none.
A similar pattern will be used to download from a web page, except, this time the device goes to the network. How to make an HTTP request and how the system waits for a response is beyond the scope of this answer.
It is also possible that the external system is another program, in which case it would run on a thread. But it won't be a managed thread on your process.
Your take away is that these task do not run on any of your threads. And their timing might depend on external factors. Thus, it makes no sense to think of them as running in the same thread, or that we can predict their timing (well, except of course, in the case of the timer).
Both are not very good because they create the tasks immediately. The func version is a little less overhead since it saves the Task.Run route over the thread pool just to immediately end the thread pool work and suspend on the semaphore. You don't need an async Func, you could simplify this by using an async method (possibly a local function).
But you should not do this at all. Instead, use a helper method that implements a parallel async foreach.
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate {
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
Then you just go urls.ForEachAsync(myDop, async input => await ProcessAsync(input));
Here, the tasks are created on demand. You can even make the input stream lazy.
Currently I am using a Parallel foreach loop, the code looks something like this.
Parallel.ForEach(ListData, toPost =>
{
try
{
string result = toPost.Post();
if (!result.Contains("Exception"))
{
this.Dispatcher.Invoke(() =>
{
updateLog("Success");
});
}
}
catch (Exception ex)
{
this.Dispatcher.Invoke(() =>
{
updateLog("Exception: " + ex.Message);
});
}
});
However it is going a bit slow. I am trying to fire off as many of these HttpWebRequests as possible, and the only thing I care about is the response and that is to update the UI. I read about async requests, but am confused as to how they are actually 'async' and the different between Parallel and Async.
I found somebody who asked something similar. Getting the Response of a Asynchronous HttpWebRequest
Could I just put it in a for loop? Going back to my example if I implemented async, would it be something like
foreach(toPost in ListData)
{
var task = toPost.PostAsync();
// do something with task
}
From what I understand about async, this would continue looping regardless of whether or not PostAsync() returned anything yet? If so, can I use async inside a parallel foreach loop and that would be even quicker?
Sorry If these questions are stupid. I'm quite new to all of this and am looking for the quickest way to do this, am just having trouble quite understanding the difference between async and parallel and which is better for my situation.
Parallel.ForEach uses a thread pool thread to execute the body of your code. Since the bulk of the time your thread is waiting for the I/O operation to complete it is a rather wasteful use of an expensive thread, a thread that could have been doing something else instead.
await on an async I/O operation starts the I/O call but does not wait for it to complete. In fact there is no thread for async I/O due to the use of I/O Completion Ports. It's a rather complex scenario best explained in the former link.
However, do not mix await and Parallel.ForEach, the latter is not compatible with the tasks and threading of the former.
If you have multiple operations you want to perform asynchronoursly, then you could use Task.WhenAll or take a look at TPL DataFlow.
From what I understand about async, this would continue looping regardless of whether or not PostAsync() returned anything yet? If so, can I use async inside a parallel foreach loop and that would be even quicker?
No. For each iteration of the loop you await the Task and so the next operation want execute until the task completes. Essentially the loop is serial and so are the tasks. If you want all the tasks to execute in parallel, add them to say a List<Task> and call await Task.WhenAll(tasks).
Let's say I have a Windows Service which is doing some bit of work, then sleeping for a short amount of time, over and over forever (until the service is shut down). So in the service's OnStart, I could start up a thread whose entry point is something like:
private void WorkerThreadFunc()
{
while (!shuttingDown)
{
DoSomething();
Thread.Sleep(10);
}
}
And in the service's OnStop, I somehow set that shuttingDown flag and then join the thread. Actually there might be several such threads, and other threads too, all started in OnStart and shut down/joined in OnStop.
If I want to instead do this sort of thing in an async/await based Windows Service, it seems like I could have OnStart create cancelable tasks but not await (or wait) on them, and have OnStop cancel those tasks and then Task.WhenAll().Wait() on them. If I understand correctly, the equivalent of the "WorkerThreadFunc" shown above might be something like:
private async Task WorkAsync(CancellationToken cancel)
{
while (true)
{
cancel.ThrowIfCancellationRequested();
DoSomething();
await Task.Delay(10, cancel).ConfigureAwait(false);
}
}
Question #1: Uh... right? I am new to async/await and still trying to get my head around it.
Assuming that's right, now let's say that DoSomething() call is (or includes) a synchronous write I/O to some piece of hardware. If I'm understanding correctly:
Question #2: That is bad? I shouldn't be doing synchronous I/O within a Task in an async/await-based program? Because it ties up a thread from the thread pool while the I/O is happening, and threads from the thread pool are a highly limited resource? Please note that I might have dozens of such Workers going simultaneously to different pieces of hardware.
I am not sure I'm understanding that correctly - I am getting the idea that it's bad from articles like Stephen Cleary's "Task.Run Etiquette Examples: Don't Use Task.Run for the Wrong Thing", but that's specifically about it being bad to do blocking work within Task.Run. I'm not sure if it's also bad if I'm just doing it directly, as in the "private async Task Work()" example above?
Assuming that's bad too, then if I understand correctly I should instead utilize the nonblocking version of DoSomething (creating a nonblocking version of it if it doesn't already exist), and then:
private async Task WorkAsync(CancellationToken cancel)
{
while (true)
{
cancel.ThrowIfCancellationRequested();
await DoSomethingAsync(cancel).ConfigureAwait(false);
await Task.Delay(10, cancel).ConfigureAwait(false);
}
}
Question #3: But... what if DoSomething is from a third party library, which I must use and cannot alter, and that library doesn't expose a nonblocking version of DoSomething? It's just a black box set in stone that at some point does a blocking write to a piece of hardware.
Maybe I wrap it and use TaskCompletionSource? Something like:
private async Task WorkAsync(CancellationToken cancel)
{
while (true)
{
cancel.ThrowIfCancellationRequested();
await WrappedDoSomething().ConfigureAwait(false);
await Task.Delay(10, cancel).ConfigureAwait(false);
}
}
private Task WrappedDoSomething()
{
var tcs = new TaskCompletionSource<object>();
DoSomething();
tcs.SetResult(null);
return tcs.Task;
}
But that seems like it's just pushing the issue down a bit further rather than resolving it. WorkAsync() will still block when it calls WrappedDoSomething(), and only get to the "await" for that after WrappedDoSomething() has already completed the blocking work. Right?
Given that (if I understand correctly) in the general case async/await should be allowed to "spread" all the way up and down in a program, would this mean that if I need to use such a library, I essentially should not make the program async/await-based? I should go back to the Thread/WorkerThreadFunc/Thread.Sleep world?
What if an async/await-based program already exists, doing other things, but now additional functionality that uses such a library needs to be added to it? Does that mean that the async/await-based program should be rewritten as a Thread/etc.-based program?
Actually there might be several such threads, and other threads too, all started in OnStart and shut down/joined in OnStop.
On a side note, it's usually simpler to have a single "master" thread that will start/join all the others. Then OnStart/OnStop just deals with the master thread.
If I want to instead do this sort of thing in an async/await based Windows Service, it seems like I could have OnStart create cancelable tasks but not await (or wait) on them, and have OnStop cancel those tasks and then Task.WhenAll().Wait() on them.
That's a perfectly acceptable approach.
If I understand correctly, the equivalent of the "WorkerThreadFunc" shown above might be something like:
Probably want to pass the CancellationToken down; cancellation can be used by synchronous code, too:
private async Task WorkAsync(CancellationToken cancel)
{
while (true)
{
DoSomething(cancel);
await Task.Delay(10, cancel).ConfigureAwait(false);
}
}
Question #1: Uh... right? I am new to async/await and still trying to get my head around it.
It's not wrong, but it only saves you one thread on a Win32 service, which doesn't do much for you.
Question #2: That is bad? I shouldn't be doing synchronous I/O within a Task in an async/await-based program? Because it ties up a thread from the thread pool while the I/O is happening, and threads from the thread pool are a highly limited resource? Please note that I might have dozens of such Workers going simultaneously to different pieces of hardware.
Dozens of threads are not a lot. Generally, asynchronous I/O is better because it doesn't use any threads at all, but in this case you're on the desktop, so threads are not a highly limited resource. async is most beneficial on UI apps (where the UI thread is special and needs to be freed), and ASP.NET apps that need to scale (where the thread pool limits scalability).
Bottom line: calling a blocking method from an asynchronous method is not bad but it's not the best, either. If there is an asynchronous method, call that instead. But if there isn't, then just keep the blocking call and document it in the XML comments for that method (because an asynchronous method blocking is rather surprising behavior).
I am getting the idea that it's bad from articles like Stephen Cleary's "Task.Run Etiquette Examples: Don't Use Task.Run for the Wrong Thing", but that's specifically about it being bad to do blocking work within Task.Run.
Yes, that is specifically about using Task.Run to wrap synchronous methods and pretend they're asynchronous. It's a common mistake; all it does is trade one thread pool thread for another.
Assuming that's bad too, then if I understand correctly I should instead utilize the nonblocking version of DoSomething (creating a nonblocking version of it if it doesn't already exist)
Asynchronous is better (in terms of resource utilization - that is, fewer threads used), so if you want/need to reduce the number of threads, you should use async.
Question #3: But... what if DoSomething is from a third party library, which I must use and cannot alter, and that library doesn't expose a nonblocking version of DoSomething? It's just a black box set in stone that at some point does a blocking write to a piece of hardware.
Then just call it directly.
Maybe I wrap it and use TaskCompletionSource?
No, that doesn't do anything useful. That just calls it synchronously and then returns an already-completed task.
But that seems like it's just pushing the issue down a bit further rather than resolving it. WorkAsync() will still block when it calls WrappedDoSomething(), and only get to the "await" for that after WrappedDoSomething() has already completed the blocking work. Right?
Yup.
Given that (if I understand correctly) in the general case async/await should be allowed to "spread" all the way up and down in a program, would this mean that if I need to use such a library, I essentially should not make the program async/await-based? I should go back to the Thread/WorkerThreadFunc/Thread.Sleep world?
Assuming you already have a blocking Win32 service, it's probably fine to just keep it as it is. If you are writing a new one, personally I would make it async to reduce threads and allow asynchronous APIs, but you don't have to do it either way. I prefer Tasks over Threads in general, since it's much easier to get results from Tasks (including exceptions).
The "async all the way" rule only goes one way. That is, once you call an async method, then its caller should be async, and its caller should be async, etc. It does not mean that every method called by an async method must be async.
So, one good reason to have an async Win32 service would be if there's an async-only API you need to consume. That would cause your DoSomething method to become async DoSomethingAsync.
What if an async/await-based program already exists, doing other things, but now additional functionality that uses such a library needs to be added to it? Does that mean that the async/await-based program should be rewritten as a Thread/etc.-based program?
No. You can always just block from an async method. With proper documentation so when you are reusing/maintaining this code a year from now, you don't swear at your past self. :)
If you still spawn your threads, well, yes, it's bad. Because it will not give you any benefit as the thread is still allocated and consuming resources for the specific purpose of running your worker function. Running a few threads to be able to do work in parallel within a service has a minimal impact on your application.
If DoSomething() is synchronous, you could switch to the Timer class instead. It allows multiple timers to use a smaller amount of threads.
If it's important that the jobs can complete, you can modify your worker classes like this:
SemaphoreSlim _shutdownEvent = new SemaphoreSlim(0,1);
public async Task Stop()
{
return await _shutdownEvent.WaitAsync();
}
private void WorkerThreadFunc()
{
while (!shuttingDown)
{
DoSomething();
Thread.Sleep(10);
}
_shutdownEvent.Release();
}
.. which means that during shutdown you can do this:
var tasks = myServices.Select(x=> x.Stop());
Task.WaitAll(tasks);
A thread can only do one thing at a time. While it is working on your DoSomething it can't do anything else.
In an interview Eric Lippert described async-await in a restaurant metaphor. He suggests to use async-await only for functionality where your thread can do other things instead of waiting for a process to complete, like respond to operator input.
Alas, your thread is not waiting, it is doing hard work in DoSomething. And as long as DoSomething is not awaiting, your thread will not return from DoSomething to do the next thing.
So if your thread has something meaningful to do while procedure DoSomething is executing, it's wise to let another thread do the DoSomething, while your original thread is doing the meaningful stuff. Task.Run( () => DoSomething()) could do this for you. As long as the thread that called Task.Run doesn't await for this task, it is free to do other things.
You also want to cancel your process. DoSomething can't be cancelled. So even if cancellation is requested you'll have to wait until DoSomething is completed.
Below is your DoSomething in a form with a Start button and a Cancel button. While your thread is DoingSomething, one of the meaningful things your GUI thread may want to do is respond to pressing the cancel button:
void CancellableDoSomething(CancellationToken token)
{
while (!token.IsCancellationRequested)
{
DoSomething()
}
}
async Task DoSomethingAsync(CancellationToken token)
{
var task = Task.Run(CancellableDoSomething(token), token);
// if you have something meaningful to do, do it now, otherwise:
return Task;
}
CancellationTokenSource cancellationTokenSource = null;
private async void OnButtonStartSomething_Clicked(object sender, ...)
{
if (cancellationTokenSource != null)
// already doing something
return
// else: not doing something: start doing something
cancellationTokenSource = new CancellationtokenSource()
var task = AwaitDoSomethingAsync(cancellationTokenSource.Token);
// if you have something meaningful to do, do it now, otherwise:
await task;
cancellationTokenSource.Dispose();
cancellationTokenSource = null;
}
private void OnButtonCancelSomething(object sender, ...)
{
if (cancellationTokenSource == null)
// not doing something, nothing to cancel
return;
// else: cancel doing something
cancellationTokenSource.Cancel();
}
I'm working on a SMS-based game (Value Added Service), in which a question must be sent to each subscriber on a daily basis. There are over 500,000 subscribers and therefore performance is a key factor. Since each subscriber can be a difference state of the competition with different variables, database must be queried separately for each subscriber before sending a text message. To achieve the best performance I'm using .Net Task Parallel Library (TPL) to spawn parallel threadpool threads and do as much async operations as possible in each thread to finally send texts asap.
Before describing the actual problem there are some more information necessary to give about the code.
At first there was no async operation in the code. I just scheduled some 500,000 tasks with the default task scheduler into the Threadpool and each task would work through the routines, blocking on all EF (Entity Framework) queries and sequentially finishing its job. It was good, but not fast enough. Then I changed all EF queries to Async, the outcome was superb in speed but there has been so many deadlocks and timeouts in SQL server that about a third of the subscribers never received a text! After trying different solutions, I decided not to do too many Async Database operations while I have over 500,000 tasks running on a 24 core server (with at least 24 concurrent threadpool threads)!
I rolled back all the changes (the Asycn ones) expect for one web service call in each task which remained Async.
Now the weird case:
In my code, I have a boolean variable named "isCrossSellActive". When the variable is set some more DB operations take place and an asycn webservice call will happen on which the thread awaits. When this variable is false, none of these operations will happen including the async webservice call. Awkwardly when the variable is set the code runs so much faster than when it's not! It seems like for some reason the awaited async code (the cooperative thread) is making the code faster.
Here is the code:
public async Task AutoSendMessages(...)
{
//Get list of subscriptions plus some initialization
LimitedConcurrencyLevelTaskScheduler lcts = new LimitedConcurrencyLevelTaskScheduler(numberOfThreads);
TaskFactory taskFactory = new TaskFactory(lcts);
List<Task> tasks = new List<Task>();
//....
foreach (var sub in subscriptions)
{
AutoSendData data = new AutoSendData
{
ServiceId = serviceId,
MSISDN = sub.subscriber,
IsCrossSellActive = bolCrossSellHeader
};
tasks.Add(await taskFactory.StartNew(async (x) =>
{
await SendQuestion(x);
}, data));
}
GC.Collect();
try
{
Task.WaitAll(tasks.ToArray());
}
catch (AggregateException ae)
{
ae.Handle((ex) =>
{
_logRepo.LogException(1, "", ex);
return true;
});
}
await _autoSendRepo.SetAutoSendingStatusEnd(statusId);
}
public async Task SendQuestion(object data)
{
//extract variables from input parameter
try
{
if (isCrossSellActive)
{
int pieceCount = subscriptionRepo.GetSubscriberCarPieces(curSubscription.service, curSubscription.subscriber).Count(c => c.isConfirmed);
foreach (var rule in csRules)
{
if (rule.Applies)
{
if (await HttpClientHelper.GetJsonAsync<bool>(url, rule.TargetServiceBaseAddress))
{
int noOfAddedPieces = SomeCalculations();
if (noOfAddedPieces > 0)
{
crossSellRepo.SetPromissedPieces(curSubscription.subscriber, curSubscription.service,
rule.TargetShortCode, noOfAddedPieces, 0, rule.ExpirationLimitDays);
}
}
}
}
}
// The rest of the code. (Some db CRUD)
await SmsClient.SendSoapMessage(subscriber, smsBody);
}
catch (Exception ex){//...}
}
Ok, thanks to #usr and the clue he gave me, the problem is finally solved!
His comment drew my attention to the awaited taskFactory.StartNew(...) line which sequentially adds new tasks to the "tasks" list which is then awaited on by Task.WaitAll(tasks);
At first I removed the await keyword before the taskFactory.StartNew() and it led the code towards a horrible state of malfunction! I then returned the await keyword to before taskFactory.StartNew() and debugged the code using breakpoints and amazingly saw that the threads are ran one after another and sequentially before the first thread reaches the first await inside the "SendQuestion" routine. When the "isCrossSellActive" flag was set despite the more jobs a thread should do the first await keyword is reached earlier thus enabling the next scheduled task to run. But when its not set the only await keyword is the last line of the routine so its most likely to run sequentially to the end.
usr's suggestion to remove the await keyword in the for loop seemed to be correct but the problem was the Task.WaitAll() line would wait on the wrong list of Task<Task<void>> instead of Task<void>. I finally used Task.Run instead of TaskFactory.StartNew and everything changed. Now the service is working well. The final code inside the for loop is:
tasks.Add(Task.Run(async () =>
{
await SendQuestion(data);
}));
and the problem was solved.
Thank you all.
P.S. Read this article on Task.Run and why TaskFactory.StartNew is dangerous: http://blog.stephencleary.com/2013/08/startnew-is-dangerous.html
It's extremly hard to tell unless you add some profiling that tell you which code is taking longer now.
Without seeing more numbers my best guess would be that the SMS service doesn't like when you send too many requests in a short time and chokes. When you add the extra DB calls the extra delay make the sms service work better.
A few other small details:
await Task.WhenAll is usually a bit better than Task.WaitAll. WaitAll means the thread will sit around waiting. Making a deadlock slightly more likely.
Instead of:
tasks.Add(await taskFactory.StartNew(async (x) =>
{
await SendQuestion(x);
}, data));
You should be able to do
tasks.Add(SendQuestion(data));