Creating parallel tasks, how to limit new task creation?

Creating parallel tasks, how to limit new task creation? - c#

In my program I'm starting parallel tasks one by one (http requests), doing it far ahead of any actual results those tasks can return.
It looks like
while (continueTaskSwarming == true)
{
Task.Run(() => Task());
}
Of course that way I have way more tasks scheduled than can be processed, so when I cancel them through cancellation token, it takes a lot of time to just cancel them all - they are scheduled after all, so they must actually run before they can be cancelled.
How can I organize task creation so that there always be only limited bunch of tasks ready to run, with new tasks scheduled only when there is some "place" for them? I mean, some "straight" way, something like
while (continueTaskSwarming == true)
{
PauseUntilScheduledTaskCountIsLowerThan(100);
Task.Run(() => Task());
}
Is that possible?
EDIT:
The obvious way, I think, is to put all tasks created into some list, removing them on completion and checking count of active/scheduled tasks. But I don't know (and don't know keywords to google it) if I can "remove" task from itself, or from ContinueWith() method. Anyone knows?

There are lots of ways to do this. You can protect a counter (of current tasks) with a lock, using Monitor.Wait() to have the task-producing thread wait until the counter is decremented and it's safe to start another task.
Another approach is to use the Semaphore synchronization object. This does basically the same thing, but encapsulated in an easy-to-use object. You create a new Semaphore, specifying how many concurrent threads you want to be able to hold the lock at once. By having each task take a lock on the semaphore, then you can have the task-producer also lock on the semaphore and when you read the max number of tasks, the producer won't be able to continue and create another task until one of the currently running tasks completes.

If I understand correctly, you are trying to achieve limited concurrency while executing tasks
this will help you
http://msdn.microsoft.com/en-us/library/ee789351(v=vs.110).aspx
Limited concurrency level task scheduler (with task priority) handling wrapped tasks

Related

Continue workflow while async operation is running

I've commented on Eric Lippert's answer to What's the difference between Task.Start/Wait and Async/Await?
I am asking this question since I am still unsure if I understand the concept correctly and how I'd achieve my goal. Adding tons of comments is not very helpful for anyone.
What I understand: await tells the compiler that the current thread has the capacity to perform some other computation and get back once the awaited operation is done. That means the workflow is interrupted until the awaited operation is done. This doesn't speed up the computation of the context which contains the await but increases the overall application performance due to better usage of workers.
No my issue: I'd like to continue the workflow and in the end make sure the operation is done. So basically allow the worker to continue the current workflow even if the awaitable operation is not complete and await completion at the end of the workflow. I'd like the worker to spend time on my workflow and not run away and help someone else.
What I think might work: Consider n async Add operations and a Flush operation which processes added items. Flush requires the items to be added. But adding items doesn't require the previous item to be added. So basically I'd like to collect all running Add operations and await all of them once all have been added. And after they have been awaited they should be Flushed.
Can I do this by adding the Add Tasks to a list and in the end await all those tasks?
Or is this pseudo-async and has no benefit in the end?
Is it the same as awaiting all the Add operations directly? (Without collecting them)

What I understand: await tells the compiler that the current thread has the capacity to perform some other computation and get back once the awaited operation is done.
That's pretty close. A better way to characterize it is: await means suspend this workflow until the awaited task is complete. If the workflow is suspended because the task isn't done, that frees up this thread to find more work to do, and the workflow will be scheduled to resume at some point in the future when the task is done. The choice of what to do while waiting is given to the code that most recently called this workflow; that is, an await is actually a fancy return. After all, return means "let my caller decide what to do next".
If the task is done at the point of the await then the workflow simply continues normally.
Await is an asynchronous wait. It waits for a task to be done, but it keeps busy while it is waiting.
I'd like to continue the workflow and in the end make sure the operation is done. So basically allow the worker to continue the current workflow even if the awaitable operation is not complete and await completion at the end of the workflow. I'd like the worker to spend time on my workflow and not run away and help someone else.
Sure, that's fine. Don't await a task until the last possible moment, when you need the task to be complete before the workflow can continue. That's a best practice.
However: if your workflow is doing operations that are taking more than let's say 30 milliseconds without awaiting something, and you're on the UI thread, then you are potentially freezing the UI and irritating the user.
Can I do this by adding the Add Tasks to a list and in the end await all those tasks?
Of course you can; that's a good idea. Use the WhenAll combinator to easily await all of a sequence of tasks.
Is it the same as awaiting all the Add operations directly? (Without collecting them)
No, it's different. As you correctly note, awaiting each Add operation will ensure that no Add starts until the previous one completes. If there's no requirement that they be serialized in this manner, you can make a more efficient workflow by starting the tasks first, and then awaiting them after they're all started.

If I understand your question correctly, what you want to do is parallelize the asynchronous work, which is very common.
Consider the following code:
async Task Add(Item item) { ... }
async Task YourMethod()
{
var tasks = new List<Task>();
foreach (var item in collection)
{
tasks.Add(Add(item));
}
// do any work you need
Console.WriteLine("Working...");
// then ensure the tasks are done
await Task.WhenAll(tasks);
// and flush them out
await Flush();
}

Multiple await operations or just one

I've seen how the await keyword is implemented and resulting structure it creates. I think I have a rudimentary understanding of it. However, is
public async Task DoWork()
{
await this.Operation1Async();
await this.Operation2Async();
await this.Operation3Async();
}
"better" (generally speaking) or
public async Task DoWork()
{
await this.Operation1Async();
this.Operation2();
this.Operation3();
}
The problem with the first approach is that it is creating a new Task for each await call? Which entails a new thread?
Whereas the first creates a new Task on the first await and then everything from there is processed in the new Task?
Edit
Ok maybe I wasn't too clear, but if for example we have
while (await reader.ReadAsync())
{
//...
}
await reader.NextResultAsync();
// ...
Is this not creating two tasks? One in the main thread with the first ReadAsync then another task in this newly created task with the NextResultAsync. My question is there really a need for the second task, isn't
the one task created in the main thread sufficient? So
while (await reader.ReadAsync())
{
//...
}
reader.NextResult();
// ...

it is creating a new Task for each await call? Which entails a new thread?
Yes and no. Yes, it is creating a Task for each asynchronous method; the async state machine will create one. However, these tasks are not threads, nor do they even run on threads. They do not "run" anywhere.
You may find some blog posts of mine useful:
async intro, which explains how async/await work.
There Is No Thread, which explains how tasks can work without threads.
Intro to the Task type, which explains how some tasks (Delegate Tasks) have code and run on threads, but the tasks used by async (Promise Tasks) do not.
Whereas the first creates a new Task on the first await and then everything from there is processed in the new Task?
Not at all. Tasks only complete once, and the method will not continue past the await until that task is complete. So, the task returned by Operation1Async has already completed before Operation2 is even called.

The 2 examples are not functionally equivalent so you would choose the one depending on your specific needs. In the first example the 3 tasks are executed sequentially, whereas in the second example the second and third tasks are running in parallel without waiting for their result to complete. Also in the second example the DoWork method could return before the second and third tasks have completed.
If you want to ensure that the tasks have completed before leaving the DoWork method body you might need to do this:
public async Task DoWork()
{
await this.Operation1Async();
this.Operation2().GetAwaiter().GetResult();
this.Operation3().GetAwaiter().GetResult();
}
which of course is absolutely terrible and you should never be doing it as it is blocking the main thread in which case you go with the first example. If those tasks use I/O completion ports then you should definitely take advantage of them instead of blocking the main thread.
If on the other hand you are asking whether you should make Operation2 and Operation3 asynchronous, then the answer is this: If they are doing I/O bound stuff where you can take advantage of I/O Completion Ports then you should absolutely make them async and go with the first approach. If they are CPU bound operations where you cannot use IOCP then it might be better to leave them synchronous because it wouldn't make any sense to execute this CPU bound operations in a separate task which you would block for anyway.

The problem with the first approach is that it is creating a new Task for each await call? Which entails a new thread?
This is your misunderstanding, which is leading to you to be suspicious of the code in the first example.
A Task does not entail a new thread. A Task certainly can be run on a new thread if you want to do so, but an important use of tasks is when a task directly or indirectly works through asynchronous i/o, which happens when the task, or one that it in turn awaits on, uses async i/o to access files or network streams (e.g. web or database access) allowing a pooled thread to be returned to the pool until that i/o has completed.
As such if the task does not complete immediately (which may happen if e.g. its purpose could be fulfilled entirely from currently-filled buffers) the thread currently running it can be returned to the pool and can be used to do something else in the meantime. When the i/o completes then another thread from the pool can take over and complete that waiting, which can then finish the waiting in a task waiting on that, and so on.
As such the first example in your question allows for fewer threads being used in total, especially when other work will also being using threads from the same pool.
In the second example once the first await has completed the thread that handled its completion will be blocking on the synchronous equivalent methods. If other operations also need to use threads from the pool then that thread not being returned to it, fresh threads will have to be spun up. As such the second example is the example that will need more threads.

One is not better than the other, they do different things.
In the first example, each operation is scheduled and performed on a thread, represented by a Task. Note: It's not guaranteed what thread they happen on.
The await keyword means (loosely) "wait until this asynchronous operation has finished and then continue". The continuation, is not necessarily done on the same thread either.
This means example one, is a synchronous processing of asynchronous operations. Now just because a Task is created, it doesn't infer a Thread is also created, there is a pool of threads the TaskScheduler uses, which have already been created, very minimal overhead is actually introduced.
In your second example, the await will call the first operation using the scheduler, then call the next two as normal. No Task or Thread is created for the second two calls, nor is it calling methods on a Task.
In the first example, you can also look into making your asynchronous calls simultaneous. Which will schedule all three operations to run "at the same time" (not guaranteed), and wait until they have all finished executing.
public async Task DoWork()
{
var o1 = this.Operation1Async();
var o2 = this.Operation2Async();
var o3 = this.Operation3Async();
await Task.WhenAll(o1, o2, o3);
}

C# Async - How many objects are created here?

I am trying to understand the code I wrote,
for (int i = 0; i< 5; i++)
{
ExecuteCMD("/c robocopy C:\\Source D:\\Source /MIR", true);
}
public async void ExecuteCMD(string cmdLine, bool waitForExit)
{
await Task.Factory.StartNew(() =>
{
ExecuteProcess proc = new ExecuteProcess(cmdLine, waitForExit);
proc.Execute();
} );
}
The async method ExecuteCMD will run in a loop for 5 times. I know async doesn't create new threads. So, are there 5 objects created with the same name ('proc') in the same thread? Please explain
Many thanks in advance!

do you mean your ExecuteProcess proc Object? This is a local variable of your lambda function. So there is no conflict in your code.
The Lambda
() =>
{
ExecuteProcess proc = new ExecuteProcess(cmdLine, waitForExit);
proc.Execute();
}
is called 5 Times but every call creates only one instance of ExecuteProcess for the variable proc.

You are using Task.Factory.StartNew, so you will most likely (see Stephen Cleary's comment) end up on the default TaskScheduler, which happens to execute work on thread pool threads. Your ExecuteProcess allocation and Execute call will therefore occur 5 times, as expected, on thread pool threads (see above point re default scheduler) - and most likely in parallel to each other, and in parallel to your for loop (this last part might be difficult to wrap your head around, but that's the whole problem with async void - the execution order is non-deterministic; more on that later).
You are sort of right in that async/await does not necessarily create new threads. async/await is all about chaining tasks and their continuations so that they execute in correct order with respect to each other. Where the actual Task runs is determined by how that Task is created. Here you are explicitly requesting your work to be pushed out to the thread pool, because that is where Tasks created by Task.Factory.StartNew execute.
Others have pointed out that you might be using async void erroneously - possibly due to lack of understanding. async void is only good for scheduling work in a fire-and-forget manner, and this work needs to be self-contained, complete with its own exception handling and concurrency controls. Because your async void will run unobserved, in parallel to the rest of your code. It's like saying: "I want this piece of code to run at some point in the future. I don't need to know when it completes or whether it raised any exceptions - I'll just wait until it hits the first await, and then carry on executing my own work - and the rest of the async void will proceed on its own, in parallel, without supervision". Because of this, if you put some code after your for loop, it will most likely execute before your ExecuteProcess work, which may or may not be what you want.
Here's a step-by-step view of what actually happens in your application.
You hit the first iteration of the for loop on the main thread. The runtime calls ExecuteCMD, as if it were any other, synchronous method call. It enters the method and executes the code preceding the await, still as part of the first for loop iteration, on the main thread. Then it schedules some work on the thread pool via Task.Factory.StartNew. This work will execute at some point in the future on, let's say, thread pool thread #1. The Task returned by Task.Factory.StartNew is then awaited. This task cannot possibly complete synchronously, so the await schedules the rest of your async void to run in the future (on the main thread, after the task created by Task.Factory.StartNew has completed) and yields. At this point your ExecuteProcess work probably hasn't even started yet, but the main thread is already free to jump to the second iteration of the for loop. It does exactly that, which results in another task being scheduled to run on, say, thread pool thread #2, at some point in the future - followed by yet another continuation scheduled to run on the main thread. The for loop then jumps to the next item.
By the time your for loop ends, you will most likely have 5 Tasks waiting to execute on thread pool threads, followed by 5 continuation Tasks waiting to execute on the main thread. They will all complete at some point in the future. Your code won't know when because you told it that you don't care (by using async void).

How to achieve sequential blocking behavior in multithread application?

I'm writing an application that should simulate the behavior of a PLC. This means I have to run several threads making sure only one thread at a time is active and all others are suspended.
For example:
thread 1 repeats every 130ms and blocks all other threads. The effective runtime is 30ms and the remaining 100ms before the thread restarts can be used by other threads.
thread 2 repeats every 300ms and blocks all threads except for thread 1. The effective runtime is 50ms (the remaining 250ms can be used by other threads). Thread 2 is paused until thread 1 has finished executing code (the remaining 100ms of thread 1) and once thread 1 is asleep it resumes from where it has been paused
thread 3 repeats every 1000ms. The effective runtime is 100ms. This thread continues execution only if all other threads are suspended.
The highest priority is to complete the tasks before they are called again, otherwise I have to react, therefore a thread that should be blocked should not run until a certain point, otherwise multicore processing would elaborate the code and only wait to pass the results.
I read several posts and learned that Thread.suspend is not recomended and semaphore or monitor operations mean that the code is executed until a specific and fixed point in the code while I have to pause the threads exactly where the execution has arrived when an other thread (with higher "priority") is called.
I also looked at the priority setting but it doesn't seem to be 100% relevant since the system can override priorities.
Is there a correct or at least solid way to code the blocking mechanism?

I don't think you need to burden yourself with Threads at all. Instead, you can use Tasks with a prioritised TaskScheduler (it's not too hard to write or find by googling).
This makes the code quite easy to write, for example the highest priority thread might be something like:
while (!cancellationRequested)
{
var repeatTask = Task.Delay(130);
// Do your high priority work
await repeatTask;
}
Your other tasks will have a similar basic layout, but they will be given a lower priority in the task scheduler (this is usually handled by the task scheduler having a separate queue for each of the task priorities). Once in a while, they can check whether there is a higher priority task, and if so, they can do await Task.Yield();. In fact, in your case, it seems like you don't even need real queues - that makes this a lot easier, and even better, allows you to use Task.Yield really efficiently.
The end result is that all three of your periodic tasks are efficiently run on just a single thread (or even no thread at all if they're all waiting).
This does rely on coöperative multi-tasking, of course. It's not really possible to handle full blown real-time like pre-emption on Windows - and partial pre-emption solutions tend to be full of problems. If you're in control of most of the time spent in the task (and offload any other work using asynchronous I/O), the coöperative solution is actually far more efficient, and can give you a lot less latency (though it really shouldn't matter much).

I hope I don't missunderstand your question :)
One possibility to your problem might be to use a concurrent queue: https://msdn.microsoft.com/de-de/library/dd267265(v=vs.110).aspx
For example you create a enum to control your state and init the queue:
private ConcurrentQueue<Action> _clientActions ;
private enum Statuskatalog
{
Idle,
Busy
};
Create a timer to start and create a timerfunktion.
Timer _taskTimer = new Timer(ProcessPendingTasks, null, 100, 333);
private void ProcessPendingTasks(object x)
{
_status = Statuskatalog.Busy;
_taskTimer.Change(Timeout.Infinite, Timeout.Infinite);
Action currentTask;
while( _clientActions.TryDequeue( out currentTask ))
{
var task = new Task(currentTask);
task.Start();
task.Wait();
}
_status=Statuskatalog.Idle;
}
Now you only have to add your tasks as delegates to the queue:
_clientActions.Enqueue(delegate { **Your task** });
if (_status == Statuskatalog.Idle) _taskTimer.Change(0, 333);
On this base, you can manage your special requirements you were asking for.
Hope this was, what you were searching for.

Synchronously run a task in the same thread (no threadpool) with a timeout

I want to use Task<> type, but not with TPL, but with .NET4.5/C#async instead.
Thing is, I have some requirements for my case:
I want the task to be run synchronously (some people recommend RunSynchronously(), others Wait(), and others ContinueWith(_, TaskContinuationOptions.ExecuteSynchronously), which one is the adequate here?).
I want the task to run in the same thread (so, not use the threadpool at all).
I want the task to stop after a certain timeout has passed, and throw an exception.
For the latter, I think I need Task.Delay() but I'm not sure how to combine it with the first two requirements.
Thanks

This answer is based on #svick's comment.
I'm going to make the assumption that you want all the "work" of the method to be done on the same thread as the caller, but that you don't mind if a thread pool thread is used for cancellation purposes (I'm assuming this since you mentioned Task.Delay which will use a Timer which will use a thread pool thread when the timer fires.
That said, there would be no need for Task, since when the method returns you would know for certain that the Task was completed. Just a regular method with a timeout will do:
static void DoSomethingOrThrowAfterTimeout(int millisecondsTimeout)
{
CancellationTokenSource cts = new CancellationTokenSource(millisecondsTimeout);
CancellationToken ct = cts.Token;
// do some work
ct.ThrowIfCancellationRequested();
// do more work
ct.ThrowIfCancellationRequested();
// repeat until done.
}
Obviously, with this approach of using cooperative cancellation, the method won't timeout exactly at the timeout, as it will be dependent on how small you can split up the work in the method.
If you want to avoid the usage of another thread (for the CancellationTokenSource), then you could track the starting time and then check how much time has passed (to see if you've exceeded the timeout) at various points in the method (like how ct.ThrowIfCancellationRequested() is used above.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.