I am trying to understand asynchronous programming in C#. I created an example of a function that can be called only once. For example reading from a big file. I want to create a query and request will wait in query until they can safely read file.
AsyncWork aw = new AsyncWork();
Task.Run(() => {
for (int count = 0; count < 10; count++) {
aw.DoAsyncWork(count);
Console.WriteLine("request added");
}
});
Console.ReadKey();
and AsyncWork
public class AsyncWork {
int Requests = 0;
int Workers = 0;
public AsyncWork() { }
public async Task DoAsyncWork(int count) {
Requests++;
while (Workers > 0) ;
Workers++;
Requests--;
if (Workers > 1) Console.WriteLine("MORE WORKERS AT ONCE");
Console.WriteLine(count.ToString() + " Started task");
//reading from file
await Task.Delay(1000);
Console.WriteLine(count.ToString() + " Ended task");
Workers--;
}
}
I expected that I will have 10 requests and then they will be done one by one. But output is like:
0 Started task
request added
0 Ended task
1 Started task
request added
1 Ended task
2 Started task
request added
2 Ended task
3 Started task
request added
3 Ended task
4 Started task
request added
4 Ended task
5 Started task
request added
5 Ended task
6 Started task
request added
6 Ended task
7 Started task
request added
7 Ended task
8 Started task
request added
8 Ended task
9 Started task
request added
9 Ended task
Why it is running synchronously?
When you run the code your first worker will start working, then the next will get stuck in the busy loop that you have, while (Workers > 0) ; and won't move on until the previous worker finishes. Then when it starts the next worker will be stuck there, and so on, for each iteration of the loop in which you start the workers.
So you only ever have at most one worker doing work, and one worker pegging an entire CPU sitting there waiting for it to finish.
A proper way to synchronize access when writing asynchronous code is to use a SemaphoreSlim and use WaitAsync, which will asynchronously wait until the semaphore can be acquired, rather than synchronously blocking the thread (and pegging the CPU while you're at it) for the other workers to finish. It also has the advantage of being safe to access from multiple threads, unlike an integer which is not safe to change from multiple threads.
Related
I have a method that converts a csv file into a particular model which I want to split up into multiple tasks as there's 700k+ records. I'm using .Skip and .Take in the method so each running of that method knows where to start and how many to take. I have a list of numbers 1-10 that I want to iterate over and create tasks to run this method using that iterator to create the tasks and do some math to determine how many records to skip.
Here's how I'm creating the tasks:
var numberOfTasksList = Enumerable.Range(1, 10).ToList();
//I left out the math to determine rowsPerTask used as a parameter in the below method for brevity's sake
var tasks = numberOfTasksList.Select(i
=> ReadRowsList<T>(props, fields, csv, context, zohoEntities, i, i*rowsPerTask, (i-1)*rowsPerTask));
await Task.WhenAll(tasks);
The ReadRowsList method used looks like this (without the parameters):
public static async Task<string> ReadRowsList<T>(...parameters) where T : class, new()
{
//work to run
return $"added rows for task {i}";
}
That method's string that it returns is just a simple line that says $"added rows for task {i}" so it's not really a proper async/await as I'm just returning a string to say when that iteration is done.
However, when I run the program, the method waits for the first iteration (where i=1) to complete before starting the second iteration of running the program, so it's not running in parallel. I'm not the best when it comes to async/parallel programming, but is there something obvious going on that would cause the task to have to wait until the previous iteration finishes before the next task gets started? From my understanding, using the above code to create tasks and using .WhenAll(tasks) would create a new thread for each iteration, but I must be missing something.
In short:
async does not equal multiple threads; and
making a function async Task does not make it asynchronous
When Task.WhenAll is run with pretend async code that has no awaits the current thread cannot 'let go' of the task at hand and it cannot start processing another task.
As it was pointed out in the comments, the build chain warns you about it with:
This async method lacks 'await' operators and will run synchronously. Consider using the 'await' operator to await non-blocking API calls, or 'await Task.Run(...)' to do CPU-bound work on a background thread.
Trivial example
Let's consider two function with identical signatures, one with async code and one without.
static async Task DoWorkPretendAsync(int taskId)
{
Console.WriteLine($"Thread: {Thread.CurrentThread.ManagedThreadId} -> task:{taskId} > start");
Thread.Sleep(TimeSpan.FromSeconds(1));
Console.WriteLine($"Thread: {Thread.CurrentThread.ManagedThreadId} -> task:{taskId} > done");
}
static async Task DoWorkAsync(int taskId)
{
Console.WriteLine($"Thread: {Thread.CurrentThread.ManagedThreadId} -> task:{taskId} > start");
await Task.Delay(TimeSpan.FromSeconds(1));
Console.WriteLine($"Thread: {Thread.CurrentThread.ManagedThreadId} -> task:{taskId} > done");
}
If we test them with the following snippet
await DoItAsync(DoWorkPretendAsync);
Console.WriteLine();
await DoItAsync(DoWorkAsync);
async Task DoItAsync(Func<int, Task> f)
{
var tasks = Enumerable.Range(start: 0, count: 3).Select(i => f(i));
Console.WriteLine("Before WhenAll");
await Task.WhenAll(tasks);
Console.WriteLine("After WhenAll");
}
we can see that with DoWorkPretendAsync the tasks are executed sequentially.
Before WhenAll
Thread: 1 -> task:0 > start
Thread: 1 -> task:0 > done
Thread: 1 -> task:1 > start
Thread: 1 -> task:1 > done
Thread: 1 -> task:2 > start
Thread: 1 -> task:2 > done
After WhenAll
Before WhenAll
Thread: 1 -> task:0 > start
Thread: 1 -> task:1 > start
Thread: 1 -> task:2 > start
Thread: 5 -> task:0 > done
Thread: 5 -> task:2 > done
Thread: 7 -> task:1 > done
After WhenAll
Things to note:
even with real async all tasks are started by the same thread;
in this particular run two of the task are finished by the same thread (id:5). This is not guaranteed at all - a task can be started on one thread and continue later on another thread in the pool.
I have created a task and provided the wait time to the task.wait() method, but the task does not wait up to the provided time and return before the wait time with completed status false.
using System;
using System.Threading;
using System.Threading.Tasks;
class Test
{
static void Main(string[] args)
{
for(int i = 0 ; i < 10 ; i++)
{
int localValue = i;
Task.Factory.StartNew(() => ProcessTask(localValue));
}
Console.ReadKey();
}
private static void ProcessTask(int thread)
{
var task = Task<int>.Factory.StartNew(() => GetSomeValue());
task.Wait(2000);
if(task.IsCompleted)
{
Console.WriteLine("Completed Thread: " + thread);
}
else
{
Console.WriteLine("Not Completed Thread " + thread);
}
}
private static int GetSomeValue()
{
Thread.Sleep(400);
return 5;
}
}
Update:
I have updated the code. When I have run this code I got the following output.
Only two tasks are completed out of 10. so I want to know what is the issue with this code?
Note: I am running this code in 4.5.2 frameworks.
The problem isn't that Task.Wait isn't waiting long enough here - it's that you're assuming that as soon as you call Task.Factory.StartNew() (which you should almost never do, btw - use Task.Run instead), the task is started. That's not the case. Task scheduling is a complicated topic, and I don't claim to be an expert, but when you start a lot of tasks at the same time, the thread pool will wait a little while before creating a new thread, to see if it can reuse it.
You can see this if you add more logging to your code. I added logging before and after the Wait call, and before and after the Sleep call, identifying which original value of i was involved. (I followed your convention of calling that the thread, although that's not quite the case.) The log uses DateTime.UtcNow with a pattern of MM:ss.FFF to show a timestamp down to a millisecond.
Here's the output of the log for a single task that completed:
12:01.657: Before Wait in thread 7
12:03.219: Task for thread 7 started
12:03.623: Task for thread 7 completing
12:03.625: After Wait in thread 7
Here the Wait call returns after less than 2 seconds, but that's fine because the task has completed.
And here's the output of the log for a single task that didn't complete:
12:01.644: Before Wait in thread 6
12:03.412: Task for thread 6 started
12:03.649: After Wait in thread 6
12:03.836: Task for thread 6 completing
Here Wait really has waited for 2 seconds, but the task still hasn't completed, because it only properly started just before the Wait time was up.
If you need to wait for task completion, you can use property Result. The Result property blocks the calling thread until the task finishes.
var task = Task<int>.Factory.StartNew(() => GetsomeValue());
int res = task.Result;
This question already has answers here:
Async/Await vs Threads
(2 answers)
Closed 7 years ago.
I did a sample to simulate concurrency using the code below:
var threads = new Thread[200];
//starting threads logic
for (int i = 0; i < 200; i++)
{
threads[i].Start();
}
for (int i = 0; i < 200; i++)
{
threads[i].Join();
}
The code is supposed to insert thousands of records to the database and it seems to work well, as the threads finished at almost the same time.
But, when I use:
var tasks = new List<Task<int>>();
for (int i = 0; i < 200; i++)
{
tasks.Add(insert(i));
// await insert(i);
}
int[] result = await Task.WhenAll(tasks);
it takes a lot of time to finish the same logic.
Can someone explain to me what's the difference? I thought that Await should create threads.
If you need to replicate your original Thread-based behaviour, you can use Task.Factory.StartNew(... , TaskCreationOptions.LongRunning) to schedule your work, and then block until the worker tasks complete via Task.WaitAll. I do not recommended this approach, but in terms of behaviour this will be very close to how your code was working previously.
A more in-depth analysis as to why may not getting the expected performance in your scenario is as follows:
Explanation, part 1 (async does not mean "on a different thread")
Methods marked with the async keyword do not magically run asynchronously. They are merely capable of combining awaitable operations (that may or may not run asynchronously themselves), into a single larger unit (generally Task or Task<T>).
If your insert method is async, it is still likely that it performs at least some of the work synchronously. This will definitely be the case with all of your code preceding the first await statement. This work will execute on the "main" thread (thread which calls insert) - and that will be your bottleneck or at least part thereof as the degree of parallelism for that section of your code will be 1 while you're calling insert in a tight loop, regardless of whether you await the resulting task.
To illustrate the above point, consider the following example:
void Test()
{
Debug.Print($"Kicking off async chain (thread {Thread.CurrentThread.ManagedThreadId}) - this is the main thread");
OuterTask().Wait(); // Do not block on Tasks - educational purposes only.
}
async Task OuterTask()
{
Debug.Print($"OuterTask before await (thread {Thread.CurrentThread.ManagedThreadId})");
await InnerTask().ConfigureAwait(false);
Debug.Print($"OuterTask after await (thread {Thread.CurrentThread.ManagedThreadId})");
}
async Task InnerTask()
{
Debug.Print($"InnerTask before await (thread {Thread.CurrentThread.ManagedThreadId})");
await Task.Delay(10).ConfigureAwait(false);
Debug.Print($"InnerTask after await (thread {Thread.CurrentThread.ManagedThreadId}) - we are now on the thread pool");
}
This produces the following output:
Kicking off async chain (thread 6) - this is the main thread
OuterTask before await (thread 6)
InnerTask before await (thread 6)
InnerTask after await (thread 8) - we are now on the thread pool
OuterTask after await (thread 8)
Note that the code before the first await inside Task1 and even Task2 still executes on the "main" thread. Our chain actually executes synchronously, on the same thread which kicked off the outer task, until we await the first truly async operation (in this case Task.Delay).
Additionally
If you are running in an environment where SynchronizationContext.Current is not null (i.e. Windows Forms, WPF) and you're not using ConfigureAwait(false) on the tasks awaited inside your insert method, then continuations scheduled by the async state machine after the first await statement will also likely execute on the "main" thread - although this is not guaranteed in certain environments (i.e. ASP.NET).
Explanation, part 2 (executing Tasks on the thread pool)
If, as part of your insert method, you are opting to start any Tasks manually, then you are most likely scheduling your work on the thread pool by using Task.Run or any other method of starting a new task that does not specify TaskCreationOptions.LongRunning. Once the thread pool gets saturated any newly started tasks will be queued, thus reducing the throughput of your parallel system.
Proof:
IEnumerable<Task> tasks = Enumerable
.Range(0, 200)
.Select(_ => Task.Run(() => Thread.Sleep(100))); // Using Thread.Sleep to simulate blocking calls.
await Task.WhenAll(tasks); // Completes in 2+ seconds.
Now with TaskCreationOptions.LongRunning:
IEnumerable<Task> tasks = Enumerable
.Range(0, 200)
.Select(_ => Task.Factory.StartNew(
() => Thread.Sleep(100), TaskCreationOptions.LongRunning
));
await Task.WhenAll(tasks); // Completes in under 130 milliseconds.
It is generally not a good idea to spawn 200 threads (this will not scale well), but if massive parallelisation of blocking calls is an absolute requirement, the above snippet shows you one way to do it with TPL.
In first example you created threads manually. In second you created tasks. Task - probably - are using thread pool, where limited count of threads exist. So, most task ae waiting in queue, while few of them are executing in parallel on available threads.
I was playing around with TPL, and find out something that is very strange.
Code waits for tasks to end, and doing this dummy test, found out that a couple of task were executed after the Wait call. I'm I missing something, or this is a TPL problem?
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Task tareas = null;
Console.WriteLine("Start process");
for (int i = 0; i < 4000; i++)
{
var n = i.ToString();
tareas = Task.Factory.StartNew(() =>
{
var random = new Random();
Thread.Sleep(random.Next(200, 500));
Console.WriteLine("Task completed: " + n);
});
}
tareas.Wait();
Console.WriteLine("Process end");
Console.Read();
}
}
}
Output:
Start process
Task completed: 4
Task completed: 3
...
End process
Task completed: 3996
Task completed: 3991
Task completed: 3993
Well, tareas.Wait(); only waits for the last task you created to complete. As your tasks complete in a random amount of time, it's perfectly normal that some tasks are not yet completed once the last one your created completes.
Here's an example of what could happen:
Task 1 created
Task 2 created
Task 3 created
Task 4 created
Task 1 completed
Task 3 completed
Wait for task 4 completion
Task 4 completed
// task 2 executed during more time than other
// tasks because of the random value the thread waited
Task 2 completed
As others have noted, you're waiting on only a single task, the last one scheduled.
Tasks scheduled on the default TaskScheduler use the thread pool to execute the work items as described here. With this scheduler, ordering is not guaranteed. The framework will attempt to balance the queues running on each thread in the pool, but it is possible (and not uncommon) that the queue in which the last scheduled task is put will finish running through all its tasks before the other queues.
This is without even considering the Wait. Check out the section labeled “Task Inlining” in the document I linked – if a task is being waited on, the thread executing the wait can say 'well, I can't do anything until this task is complete, let me see if I can go ahead and execute it myself'. In that case, the task gets removed from the thread pool queue entirely.
taereas will contain reference to your last created thread, so treas.Wait() will wait only for last thread to finish, so other threads may run even after your last thread has finished.
My laptop has 2 logical processors and I stumbled upon the scenario where if I schedule 2 tasks that take longer than 1 second without designating them long-running, subsequent tasks are started after 1 second has elapsed. It is possible to change this timeout?
I know normal tasks should be short-running - much shorter than a second if possible - I'm just wondering I am seeing hard-coded TPL behavior or if I can influence this behavior in any way other than designating tasks long-running.
This Console app method should demonstrate the behavior for a machine with any number of processors:
static void Main(string[] args)
{
var timer = new Stopwatch();
timer.Start();
int numberOfTasks = Environment.ProcessorCount;
var rudeTasks = new List<Task>();
var shortTasks = new List<Task>();
for (int index = 0; index < numberOfTasks; index++)
{
int capturedIndex = index;
rudeTasks.Add(Task.Factory.StartNew(() =>
{
Console.WriteLine("Starting rude task {0} at {1}ms", capturedIndex, timer.ElapsedMilliseconds);
Thread.Sleep(5000);
}));
}
for (int index = 0; index < numberOfTasks; index++)
{
int capturedIndex = index;
shortTasks.Add(Task.Factory.StartNew(() =>
{
Console.WriteLine("Short-running task {0} running at {1}ms", capturedIndex, timer.ElapsedMilliseconds);
}));
}
Task.WaitAll(shortTasks.ToArray());
Console.WriteLine("Finished waiting for short tasks at {0}ms", timer.ElapsedMilliseconds);
Task.WaitAll(rudeTasks.ToArray());
Console.WriteLine("Finished waiting for rude tasks at {0}ms", timer.ElapsedMilliseconds);
Console.ReadLine();
}
Here is the app's output on my 2 proc laptop:
Starting rude task 0 at 2ms
Starting rude task 1 at 2ms
Short-running task 0 running at 1002ms
Short-running task 1 running at 1002ms
Finished waiting for short tasks at 1002ms
Finished waiting for rude tasks at 5004ms
Press any key to continue . . .
The lines:
Short-running task 0 running at 1002ms
Short-running task 1 running at 1002ms
indicate that there is a 1 second timeout or something of that nature allowing the shorter-running tasks to get scheduled over the 'rude' tasks. That's what I'm inquiring about.
The behavior that you are seeing is not specific to the TPL, it's specific to the TPL's default scheduler. The scheduler is attempting to increase the number of threads so that those two that are running don't "hog" the CPU and choke out the others. It's also helpful in avoiding deadlock situations if the two that are running start and wait on Tasks themselves.
If you want to change the scheduling behavior, you might want to look into implementing your own TaskScheduler.
This is standard behavior for the threadpool scheduler. It tries to keep the number of active threads equal to the number of cores. But can't do the job really well when your tasks do a lot of blocking instead of running. Sleeping in your case. Twice a second it allows another thread to run to try to work down the backlog. Seems like you have a dual-core cpu.
The proper workaround is to use TaskCreationOptions.LongRunning so the scheduler uses a regular Thread instead of a threadpool thread. An improper workaround is to use ThreadPool.SetMinThreads. But you should perhaps focus on doing real work in your tasks, Sleep() is not a very good simulation of that.
The problem is it takes a while for the scheduler to start the new tasks as it tries to determine if a task is long-running. You can tell the TPL that a task is long running as a parameter of the task:
for (int index = 0; index < numberOfTasks; index++)
{
int capturedIndex = index;
rudeTasks.Add(Task.Factory.StartNew(() =>
{
Console.WriteLine("Starting rude task {0} at {1}ms", capturedIndex, timer.ElapsedMilliseconds);
Thread.Sleep(3000);
}, TaskCreationOptions.LongRunning));
}
Resulting in:
Starting rude task 0 at 11ms
Starting rude task 1 at 13ms
Starting rude task 2 at 15ms
Starting rude task 3 at 19ms
Short-running task 0 running at 45ms
Short-running task 1 running at 45ms
Short-running task 2 running at 45ms
Short-running task 3 running at 45ms
Finished waiting for short tasks at 46ms
Finished waiting for rude tasks at 3019ms