I'm new to tasks and have a question regarding the usage. Does the Task.Factory fire for all items in the foreach loop or block at the 'await' basically making the program single threaded? If I am thinking about this correctly, the foreach loop starts all the tasks and the .GetAwaiter().GetResult(); is blocking the main thread until the last task is complete.
Also, I'm just wanting some anonymous tasks to load the data. Would this be a correct implementation? I'm not referring to exception handling as this is simply an example.
To provide clarity, I am loading data into a database from an outside API. This one is using the FRED database. (https://fred.stlouisfed.org/), but I have several I will hit to complete the entire transfer (maybe 200k data points). Once they are done I update the tables, refresh market calculations, etc. Some of it is real time and some of it is End-of-day. I would also like to say, I currently have everything working in docker, but have been working to update the code using tasks to improve execution.
class Program
{
private async Task SQLBulkLoader()
{
foreach (var fileListObj in indicators.file_list)
{
await Task.Factory.StartNew( () =>
{
string json = this.GET(//API call);
SeriesObject obj = JsonConvert.DeserializeObject<SeriesObject>(json);
DataTable dataTableConversion = ConvertToDataTable(obj.observations);
dataTableConversion.TableName = fileListObj.series_id;
using (SqlConnection dbConnection = new SqlConnection("SQL Connection"))
{
dbConnection.Open();
using (SqlBulkCopy s = new SqlBulkCopy(dbConnection))
{
s.DestinationTableName = dataTableConversion.TableName;
foreach (var column in dataTableConversion.Columns)
s.ColumnMappings.Add(column.ToString(), column.ToString());
s.WriteToServer(dataTableConversion);
}
Console.WriteLine("File: {0} Complete", fileListObj.series_id);
}
});
}
}
static void Main(string[] args)
{
Program worker = new Program();
worker.SQLBulkLoader().GetAwaiter().GetResult();
}
}
Your awaiting the task returned from Task.Factory.StartNew does make it effectively single threaded. You can see a simple demonstration of this with this short LinqPad example:
for (var i = 0; i < 3; i++)
{
var index = i;
$"{index} inline".Dump();
await Task.Run(() =>
{
Thread.Sleep((3 - index) * 1000);
$"{index} in thread".Dump();
});
}
Here we wait less as we progress through the loop. The output is:
0 inline
0 in thread
1 inline
1 in thread
2 inline
2 in thread
If you remove the await in front of StartNew you'll see it runs in parallel. As others have mentioned, you can certainly use Parallel.ForEach, but for a demonstration of doing it a bit more manually, you can consider a solution like this:
var tasks = new List<Task>();
for (var i = 0; i < 3; i++)
{
var index = i;
$"{index} inline".Dump();
tasks.Add(Task.Factory.StartNew(() =>
{
Thread.Sleep((3 - index) * 1000);
$"{index} in thread".Dump();
}));
}
Task.WaitAll(tasks.ToArray());
Notice now how the result is:
0 inline
1 inline
2 inline
2 in thread
1 in thread
0 in thread
This is a typical problem that C# 8.0 Async Streams are going to solve very soon.
Until C# 8.0 is released, you can use the AsyncEnumarator library:
using System.Collections.Async;
class Program
{
private async Task SQLBulkLoader() {
await indicators.file_list.ParallelForEachAsync(async fileListObj =>
{
...
await s.WriteToServerAsync(dataTableConversion);
...
},
maxDegreeOfParalellism: 3,
cancellationToken: default);
}
static void Main(string[] args)
{
Program worker = new Program();
worker.SQLBulkLoader().GetAwaiter().GetResult();
}
}
I do not recommend using Parallel.ForEach and Task.WhenAll as those functions are not designed for asynchronous streaming.
You'll want to add each task to a collection and then use Task.WhenAll to await all of the tasks in that collection:
private async Task SQLBulkLoader()
{
var tasks = new List<Task>();
foreach (var fileListObj in indicators.file_list)
{
tasks.Add(Task.Factory.StartNew( () => { //Doing Stuff }));
}
await Task.WhenAll(tasks.ToArray());
}
My take on this: most time consuming operations will be getting the data using a GET operation and the actual call to WriteToServer using SqlBulkCopy. If you take a look at that class you will see that there is a native async method WriteToServerAsync method (docs here)
. Always use those before creating Tasks yourself using Task.Run.
The same applies to the http GET call. You can use the native HttpClient.GetAsync (docs here) for that.
Doing that you can rewrite your code to this:
private async Task ProcessFileAsync(string series_id)
{
string json = await GetAsync();
SeriesObject obj = JsonConvert.DeserializeObject<SeriesObject>(json);
DataTable dataTableConversion = ConvertToDataTable(obj.observations);
dataTableConversion.TableName = series_id;
using (SqlConnection dbConnection = new SqlConnection("SQL Connection"))
{
dbConnection.Open();
using (SqlBulkCopy s = new SqlBulkCopy(dbConnection))
{
s.DestinationTableName = dataTableConversion.TableName;
foreach (var column in dataTableConversion.Columns)
s.ColumnMappings.Add(column.ToString(), column.ToString());
await s.WriteToServerAsync(dataTableConversion);
}
Console.WriteLine("File: {0} Complete", series_id);
}
}
private async Task SQLBulkLoaderAsync()
{
var tasks = indicators.file_list.Select(f => ProcessFileAsync(f.series_id));
await Task.WhenAll(tasks);
}
Both operations (http call and sql server call) are I/O calls. Using the native async/await pattern there won't even be a thread created or used, see this question for a more in-depth explanation. That is why for IO bound operations you should never have to use Task.Run (or Task.Factory.StartNew. But do mind that Task.Run is the recommended approach).
Sidenote: if you are using HttpClient in a loop, please read this about how to correctly use it.
If you need to limit the number of parallel actions you could also use TPL Dataflow as it plays very nice with Task based IO bound operations. The SQLBulkLoaderAsyncshould then be modified to (leaving the ProcessFileAsync method from earlier this answer intact):
private async Task SQLBulkLoaderAsync()
{
var ab = new ActionBlock<string>(ProcessFileAsync, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 5 });
foreach (var file in indicators.file_list)
{
ab.Post(file.series_id);
}
ab.Complete();
await ab.Completion;
}
Use a Parallel.ForEach loop to enable data parallelism over any System.Collections.Generic.IEnumerable<T> source.
// Method signature: Parallel.ForEach(IEnumerable<TSource> source, Action<TSource> body)
Parallel.ForEach(fileList, (currentFile) =>
{
//Doing Stuff
Console.WriteLine("Processing {0} on thread {1}", currentFile, Thread.CurrentThread.ManagedThreadId);
});
Why didnt you try this :), this program will not start parallel tasks (in a foreach), it will be blocking but the logic in task will be done in separate thread from threadpool (only one at the time, but the main thread will be blocked).
The proper approach in your situation is to use Paraller.ForEach
How can I convert this foreach code to Parallel.ForEach?
Related
Here I have the following piece of code:
var tasks = new List<Task>();
var stopwatch = new Stopwatch();
for (var i = 0; i < 100; i++)
{
var person = new Person { Id = i };
list.Add(person);
}
stopwatch.Start();
foreach (var item in list)
{
var task = Task.Run(async () =>
{
await Task.Delay(1000);
Console.WriteLine("Hi");
});
tasks.Add(task);
}
await Task.WhenAll(tasks);
stopwatch.Stop();
I assume that I will have about 100 seconds in the result of the stopwatch.
But I have 1,1092223.
I think I missing something, can you help me to explain why?
I assume that your confusion might come from the await keyword in await Task.Delay(1000);
But this holds only for the innerworking of the taskmethod. Inside the loop the next iteration will be performed immidiately after Task.Run is executed. So all Tasks will be started in close succession and then run in parallel. (As far as the system has free threads at hand of course) The system takes care how, when and in which order they can be executed.
In the end in this line:
await Task.WhenAll(tasks);
you actually wait for the slowest of them (or the one started as last).
To fullfill your expectation your code should actually look like this:
public async Task RunAsPseudoParallel()
{
List<Person> list = new List<Person>();
var stopwatch = new Stopwatch();
for (var i = 0; i < 100; i++)
{
var person = new Person { Id = i };
list.Add(person);
}
stopwatch.Start();
foreach (var item in list)
{
await Task.Run(async () =>
{
await Task.Delay(1000);
Console.WriteLine("Hi");
});
}
stopwatch.Stop()
}
Disclaimer: But this code is quite nonsensical, because it uses async functionality to implement a synchronous process. In this scenario you can simply leave out the Task.Run call and use a simple Thread.Sleep(1000).
Delays are always approximate.
You are limited by when the task scheduler chooses to run the delegate you pass to Task.Run. It may be executing other tasks and be unwilling to start up more threads. Or, it may launch a new thread -- which while not slow is also not free and costs time too.
You are limited by when the task scheduler chooses to resume your code after the delay completes.
You're also limited by the OS scheduler, which may be allocating CPU time to other processes/threads and end up delaying the thread that would execute your code.
Because you are launching multiple tasks, you are seeing all of these per-task variables compound into an even larger delay.
I have few methods that report some data to Data base. We want to invoke all calls to Data service asynchronously. These calls to data service are all over and so we want to make sure that these DS calls are executed one after another in order at any given time. Initially, i was using async await on each of these methods and each of the calls were executed asynchronously but we found out if they are out of sequence then there are room for errors.
So, i thought we should queue all these asynchronous tasks and send them in a separate thread but i want to know what options we have? I came across 'SemaphoreSlim' . Will this be appropriate in my use case?
Or what other options will suit my use case? Please, guide me.
So, what i have in my code currently
public static SemaphoreSlim mutex = new SemaphoreSlim(1);
//first DS call
public async Task SendModuleDataToDSAsync(Module parameters)
{
var tasks1 = new List<Task>();
var tasks2 = new List<Task>();
//await mutex.WaitAsync(); **//is this correct way to use SemaphoreSlim ?**
foreach (var setting in Module.param)
{
Task job1 = SaveModule(setting);
tasks1.Add(job1);
Task job2= SaveModule(GetAdvancedData(setting));
tasks2.Add(job2);
}
await Task.WhenAll(tasks1);
await Task.WhenAll(tasks2);
//mutex.Release(); // **is this correct?**
}
private async Task SaveModule(Module setting)
{
await Task.Run(() =>
{
// Invokes Calls to DS
...
});
}
//somewhere down the main thread, invoking second call to DS
//Second DS Call
private async Task SendInstrumentSettingsToDS(<param1>, <param2>)
{
//await mutex.WaitAsync();// **is this correct?**
await Task.Run(() =>
{
//TrackInstrumentInfoToDS
//mutex.Release();// **is this correct?**
});
if(param2)
{
await Task.Run(() =>
{
//TrackParam2InstrumentInfoToDS
});
}
}
Initially, i was using async await on each of these methods and each of the calls were executed asynchronously but we found out if they are out of sequence then there are room for errors.
So, i thought we should queue all these asynchronous tasks and send them in a separate thread but i want to know what options we have? I came across 'SemaphoreSlim' .
SemaphoreSlim does restrict asynchronous code to running one at a time, and is a valid form of mutual exclusion. However, since "out of sequence" calls can cause errors, then SemaphoreSlim is not an appropriate solution since it does not guarantee FIFO.
In a more general sense, no synchronization primitive guarantees FIFO because that can cause problems due to side effects like lock convoys. On the other hand, it is natural for data structures to be strictly FIFO.
So, you'll need to use your own FIFO queue, rather than having an implicit execution queue. Channels is a nice, performant, async-compatible queue, but since you're on an older version of C#/.NET, BlockingCollection<T> would work:
public sealed class ExecutionQueue
{
private readonly BlockingCollection<Func<Task>> _queue = new BlockingCollection<Func<Task>>();
public ExecutionQueue() => Completion = Task.Run(() => ProcessQueueAsync());
public Task Completion { get; }
public void Complete() => _queue.CompleteAdding();
private async Task ProcessQueueAsync()
{
foreach (var value in _queue.GetConsumingEnumerable())
await value();
}
}
The only tricky part with this setup is how to queue work. From the perspective of the code queueing the work, they want to know when the lambda is executed, not when the lambda is queued. From the perspective of the queue method (which I'm calling Run), the method needs to complete its returned task only after the lambda is executed. So, you can write the queue method something like this:
public Task Run(Func<Task> lambda)
{
var tcs = new TaskCompletionSource<object>();
_queue.Add(async () =>
{
// Execute the lambda and propagate the results to the Task returned from Run
try
{
await lambda();
tcs.TrySetResult(null);
}
catch (OperationCanceledException ex)
{
tcs.TrySetCanceled(ex.CancellationToken);
}
catch (Exception ex)
{
tcs.TrySetException(ex);
}
});
return tcs.Task;
}
This queueing method isn't as perfect as it could be. If a task completes with more than one exception (this is normal for parallel code), only the first one is retained (this is normal for async code). There's also an edge case around OperationCanceledException handling. But this code is good enough for most cases.
Now you can use it like this:
public static ExecutionQueue _queue = new ExecutionQueue();
public async Task SendModuleDataToDSAsync(Module parameters)
{
var tasks1 = new List<Task>();
var tasks2 = new List<Task>();
foreach (var setting in Module.param)
{
Task job1 = _queue.Run(() => SaveModule(setting));
tasks1.Add(job1);
Task job2 = _queue.Run(() => SaveModule(GetAdvancedData(setting)));
tasks2.Add(job2);
}
await Task.WhenAll(tasks1);
await Task.WhenAll(tasks2);
}
Here's a compact solution that has the least amount of moving parts but still guarantees FIFO ordering (unlike some of the suggested SemaphoreSlim solutions). There are two overloads for Enqueue so you can enqueue tasks with and without return values.
using System;
using System.Threading;
using System.Threading.Tasks;
public class TaskQueue
{
private Task _previousTask = Task.CompletedTask;
public Task Enqueue(Func<Task> asyncAction)
{
return Enqueue(async () => {
await asyncAction().ConfigureAwait(false);
return true;
});
}
public async Task<T> Enqueue<T>(Func<Task<T>> asyncFunction)
{
var tcs = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously);
// get predecessor and wait until it's done. Also atomically swap in our own completion task.
await Interlocked.Exchange(ref _previousTask, tcs.Task).ConfigureAwait(false);
try
{
return await asyncFunction().ConfigureAwait(false);
}
finally
{
tcs.SetResult();
}
}
}
Please keep in mind that your first solution queueing all tasks to lists doesn't ensure that the tasks are executed one after another. They're all running in parallel because they're not awaited until the next tasks is startet.
So yes you've to use a SemapohoreSlim to use async locking and await. A simple implementation might be:
private readonly SemaphoreSlim _syncRoot = new SemaphoreSlim(1);
public async Task SendModuleDataToDSAsync(Module parameters)
{
await this._syncRoot.WaitAsync();
try
{
foreach (var setting in Module.param)
{
await SaveModule(setting);
await SaveModule(GetAdvancedData(setting));
}
}
finally
{
this._syncRoot.Release();
}
}
If you can use Nito.AsyncEx the code can be simplified to:
public async Task SendModuleDataToDSAsync(Module parameters)
{
using var lockHandle = await this._syncRoot.LockAsync();
foreach (var setting in Module.param)
{
await SaveModule(setting);
await SaveModule(GetAdvancedData(setting));
}
}
One option is to queue operations that will create tasks instead of queuing already running tasks as the code in the question does.
PseudoCode without locking:
Queue<Func<Task>> tasksQueue = new Queue<Func<Task>>();
async Task RunAllTasks()
{
while (tasksQueue.Count > 0)
{
var taskCreator = tasksQueue.Dequeu(); // get creator
var task = taskCreator(); // staring one task at a time here
await task; // wait till task completes
}
}
// note that declaring createSaveModuleTask does not
// start SaveModule task - it will only happen after this func is invoked
// inside RunAllTasks
Func<Task> createSaveModuleTask = () => SaveModule(setting);
tasksQueue.Add(createSaveModuleTask);
tasksQueue.Add(() => SaveModule(GetAdvancedData(setting)));
// no DB operations started at this point
// this will start tasks from the queue one by one.
await RunAllTasks();
Using ConcurrentQueue would be likely be right thing in actual code. You also would need to know total number of expected operations to stop when all are started and awaited one after another.
Building on your comment under Alexeis answer, your approch with the SemaphoreSlim is correct.
Assumeing that the methods SendInstrumentSettingsToDS and SendModuleDataToDSAsync are members of the same class. You simplay need a instance variable for a SemaphoreSlim and then at the start of each methode that needs synchornization call await lock.WaitAsync() and call lock.Release() in the finally block.
public async Task SendModuleDataToDSAsync(Module parameters)
{
await lock.WaitAsync();
try
{
...
}
finally
{
lock.Release();
}
}
private async Task SendInstrumentSettingsToDS(<param1>, <param2>)
{
await lock.WaitAsync();
try
{
...
}
finally
{
lock.Release();
}
}
and it is importend that the call to lock.Release() is in the finally-block, so that if an exception is thrown somewhere in the code of the try-block the semaphore is released.
I have an IEnumerable<Task>, where each Task will call the same endpoint. However, the endpoint can only handle so many calls per second. How can I put, say, a half second delay between each call?
I have tried adding Task.Delay(), but of course awaiting them simply means that the app waits a half second before sending all the calls at once.
Here is a code snippet:
var resultTasks = orders
.Select(async task =>
{
var result = new VendorTaskResult();
try
{
result.Response = await result.CallVendorAsync();
}
catch(Exception ex)
{
result.Exception = ex;
}
return result;
} );
var results = Task.WhenAll(resultTasks);
I feel like I should do something like
Task.WhenAll(resultTasks.EmitOverTime(500));
... but how exactly do I do that?
What you describe in your question is in other words rate limiting. You'd like to apply rate limiting policy to your client, because the API you use enforces such a policy on the server to protect itself from abuse.
While you could implement rate limiting yourself, I'd recommend you to go with some well established solution. Rate Limiter from Davis Desmaisons was the one that I picked at random and I instantly liked it. It had solid documentation, superior coverage and was easy to use. It is also available as NuGet package.
Check out the simple snippet below that demonstrates running semi-overlapping tasks in sequence while defering the task start by half a second after the immediately preceding task started. Each task lasts at least 750 ms.
using ComposableAsync;
using RateLimiter;
using System;
using System.Threading.Tasks;
namespace RateLimiterTest
{
class Program
{
static void Main(string[] args)
{
Log("Starting tasks ...");
var constraint = TimeLimiter.GetFromMaxCountByInterval(1, TimeSpan.FromSeconds(0.5));
var tasks = new[]
{
DoWorkAsync("Task1", constraint),
DoWorkAsync("Task2", constraint),
DoWorkAsync("Task3", constraint),
DoWorkAsync("Task4", constraint)
};
Task.WaitAll(tasks);
Log("All tasks finished.");
Console.ReadLine();
}
static void Log(string message)
{
Console.WriteLine(DateTime.Now.ToString("HH:mm:ss.fff ") + message);
}
static async Task DoWorkAsync(string name, IDispatcher constraint)
{
await constraint;
Log(name + " started");
await Task.Delay(750);
Log(name + " finished");
}
}
}
Sample output:
10:03:27.121 Starting tasks ...
10:03:27.154 Task1 started
10:03:27.658 Task2 started
10:03:27.911 Task1 finished
10:03:28.160 Task3 started
10:03:28.410 Task2 finished
10:03:28.680 Task4 started
10:03:28.913 Task3 finished
10:03:29.443 Task4 finished
10:03:29.443 All tasks finished.
If you change the constraint to allow maximum two tasks per second (var constraint = TimeLimiter.GetFromMaxCountByInterval(2, TimeSpan.FromSeconds(1));), which is not the same as one per half a second, then the output could be like:
10:06:03.237 Starting tasks ...
10:06:03.264 Task1 started
10:06:03.268 Task2 started
10:06:04.026 Task2 finished
10:06:04.031 Task1 finished
10:06:04.275 Task3 started
10:06:04.276 Task4 started
10:06:05.032 Task4 finished
10:06:05.032 Task3 finished
10:06:05.033 All tasks finished.
Note that the current version of Rate Limiter targets .NETFramework 4.7.2+ or .NETStandard 2.0+.
This is just a thought, but another approach could be to create a queue and add another thread that runs polling the queue for calls that need to go out to your endpoint.
Have you considered just turning that into a foreach-loop with a Task.Delay call? You seem to want to explicitly call them sequentially, it won't hurt if that is obvious from your code.
var results = new List<YourResultType>;
foreach(var order in orders){
var result = new VendorTaskResult();
try
{
result.Response = await result.CallVendorAsync();
results.Add(result.Response);
}
catch(Exception ex)
{
result.Exception = ex;
}
}
Instead of selecting from orders you could loop over them, and inside the loop put the result into a list and then call Task.WhenAll.
Would look something like:
var resultTasks = new List<VendorTaskResult>(orders.Count);
orders.ToList().ForEach( item => {
var result = new VendorTaskResult();
try
{
result.Response = await result.CallVendorAsync();
}
catch(Exception ex)
{
result.Exception = ex;
}
resultTasks.Add(result);
Thread.Sleep(x);
});
var results = Task.WhenAll(resultTasks);
If you want to control the number of requests executed simultaneously, you have to use a semaphore.
I have something very similar, and it works fine with me. Please note that I call ToArray() after the Linq query finishes, that triggers the tasks:
using (HttpClient client = new HttpClient()) {
IEnumerable<Task<string>> _downloads = _group
.Select(job => {
await Task.Delay(300);
return client.GetStringAsync(<url with variable job>);
});
Task<string>[] _downloadTasks = _downloads.ToArray();
_pages = await Task.WhenAll(_downloadTasks);
}
Now please note that this will create n nunmber of tasks, all in parallel, and the Task.Delay literally does nothing. If you want to call the pages synchronously (as it sounds by putting a delay between the calls), then this code may be better:
using (HttpClient client = new HttpClient()) {
foreach (string job in _group) {
await Task.Delay(300);
_pages.Add(await client.GetStringAsync(<url with variable job>));
}
}
The download of the pages is still asynchronous (while downloading other tasks are done), but each call to download the page is synchronous, ensuring that you can wait for one to finish in order to call the next one.
The code can be easily changed to call the pages asynchronously in chunks, like every 10 pages, wait 300ms, like in this sample:
IEnumerable<string[]> toParse = myData
.Select((v, i) => new { v.code, group = i / 20 })
.GroupBy(x => x.group)
.Select(g => g.Select(x => x.code).ToArray());
using (HttpClient client = new HttpClient()) {
foreach (string[] _group in toParse) {
string[] _pages = null;
IEnumerable<Task<string>> _downloads = _group
.Select(job => {
return client.GetStringAsync(<url with job>);
});
Task<string>[] _downloadTasks = _downloads.ToArray();
_pages = await Task.WhenAll(_downloadTasks);
await Task.Delay(5000);
}
}
All this does is group your pages in chunks of 20, iterate through the chunks, download all pages of the chunk asynchronously, wait 5 seconds, move on to the next chunk.
I hope that is what you were waiting for :)
The proposed method EmitOverTime is doable, but only by blocking the current thread:
public static IEnumerable<Task<TResult>> EmitOverTime<TResult>(
this IEnumerable<Task<TResult>> tasks, int delay)
{
foreach (var item in tasks)
{
Thread.Sleep(delay); // Delay by blocking
yield return item;
}
}
Usage:
var results = await Task.WhenAll(resultTasks.EmitOverTime(500));
Probably better is to create a variant of Task.WhenAll that accepts a delay argument, and delays asyncronously:
public static async Task<TResult[]> WhenAllWithDelay<TResult>(
IEnumerable<Task<TResult>> tasks, int delay)
{
var tasksList = new List<Task<TResult>>();
foreach (var task in tasks)
{
await Task.Delay(delay).ConfigureAwait(false);
tasksList.Add(task);
}
return await Task.WhenAll(tasksList).ConfigureAwait(false);
}
Usage:
var results = await WhenAllWithDelay(resultTasks, 500);
This design implies that the enumerable of tasks should be enumerated only once. It is easy to forget this during development, and start enumerating it again, spawning a new set of tasks. For this reason I propose to make it an OnlyOnce enumerable, as it is shown in this question.
Update: I should mention why the above methods work, and under what premise. The premise is that the supplied IEnumerable<Task<TResult>> is deferred, in other words non-materialized. At the method's start there are no tasks created yet. The tasks are created one after the other during the enumeration of the enumerable, and the trick is that the enumeration is slow and controlled. The delay inside the loop ensures that the tasks are not created all at once. They are created hot (in other words already started), so at the time the last task has been created some of the first tasks may have already been completed. The materialized list of half-running/half-completed tasks is then passed to Task.WhenAll, that waits for all to complete asynchronously.
I started to have HUGE doubts regarding my code and I need some advice from more experienced programmers.
In my application on the button click, the application runs a command, that is calling a ScrapJockeys method:
if (UpdateJockeysPl) await ScrapJockeys(JPlFrom, JPlTo + 1, "jockeysPl"); //1 - 1049
ScrapJockeys is triggering a for loop, repeating code block between 20K - 150K times (depends on the case). Inside the loop, I need to call a service method, where the execution of the method takes a lot of time. Also, I wanted to have the ability of cancellation of the loop and everything that is going on inside of the loop/method.
Right now I am with a method with a list of tasks, and inside of the loop is triggered a Task.Run. Inside of each task, I am calling an awaited service method, which reduces execution time of everything to 1/4 comparing to synchronous code. Also, each task has assigned a cancellation token, like in the example GitHub link:
public async Task ScrapJockeys(int startIndex, int stopIndex, string dataType)
{
//init values and controls in here
List<Task> tasks = new List<Task>();
for (int i = startIndex; i < stopIndex; i++)
{
int j = i;
Task task = Task.Run(async () =>
{
LoadedJockey jockey = new LoadedJockey();
CancellationToken.ThrowIfCancellationRequested();
if (dataType == "jockeysPl") jockey = await _scrapServices.ScrapSingleJockeyPlAsync(j);
if (dataType == "jockeysCz") jockey = await _scrapServices.ScrapSingleJockeyCzAsync(j);
//doing some stuff with results in here
}, TokenSource.Token);
tasks.Add(task);
}
try
{
await Task.WhenAll(tasks);
}
catch (OperationCanceledException)
{
//
}
finally
{
await _dataServices.SaveAllJockeysAsync(Jockeys.ToList()); //saves everything to JSON file
//soing some stuff with UI props in here
}
}
So about my question, is there everything fine with my code? According to this article:
Many async newbies start off by trying to treat asynchronous tasks the
same as parallel (TPL) tasks and this is a major misstep.
What should I use then?
And according to this article:
On a busy server, this kind of implementation can kill scalability.
So how am I supposed to do it?
Please be noted, that the service interface method signature is Task<LoadedJockey> ScrapSingleJockeyPlAsync(int index);
And also I am not 100% sure that I am using Task.Run correctly within my service class. The methods inside are wrapping the code inside await Task.Run(() =>, like in the example GitHub link:
public async Task<LoadedJockey> ScrapSingleJockeyPlAsync(int index)
{
LoadedJockey jockey = new LoadedJockey();
await Task.Run(() =>
{
//do some time consuming things
});
return jockey;
}
As far as I understand from the articles, this is a kind of anti-pattern. But I am confused a bit. Based on this SO reply, it should be fine...? If not, how to replace it?
On the UI side, you should be using Task.Run when you have CPU-bound code that is long enough that you need to move it off the UI thread. This is completely different than the server side, where using Task.Run at all is an anti-pattern.
In your case, all your code seems to be I/O-based, so I don't see a need for Task.Run at all.
There is a statement in your question that conflicts with the provided code:
I am calling an awaited service method
public async Task<LoadedJockey> ScrapSingleJockeyPlAsync(int index)
{
await Task.Run(() =>
{
//do some time consuming things
});
}
The lambda passed to Task.Run is not async, so the service method cannot possibly be awaited. And indeed it is not.
A better solution would be to load the HTML asynchronously (e.g., using HttpClient.GetStringAsync), and then call HtmlDocument.LoadHtml, something like this:
public async Task<LoadedJockey> ScrapSingleJockeyPlAsync(int index)
{
LoadedJockey jockey = new LoadedJockey();
...
string link = sb.ToString();
var html = await httpClient.GetStringAsync(link).ConfigureAwait(false);
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
if (jockey.Name == null)
...
return jockey;
}
And also remove the Task.Run from your for loop:
private async Task ScrapJockey(string dataType)
{
LoadedJockey jockey = new LoadedJockey();
CancellationToken.ThrowIfCancellationRequested();
if (dataType == "jockeysPl") jockey = await _scrapServices.ScrapSingleJockeyPlAsync(j).ConfigureAwait(false);
if (dataType == "jockeysCz") jockey = await _scrapServices.ScrapSingleJockeyCzAsync(j).ConfigureAwait(false);
//doing some stuff with results in here
}
public async Task ScrapJockeys(int startIndex, int stopIndex, string dataType)
{
//init values and controls in here
List<Task> tasks = new List<Task>();
for (int i = startIndex; i < stopIndex; i++)
{
tasks.Add(ScrapJockey(dataType));
}
try
{
await Task.WhenAll(tasks);
}
catch (OperationCanceledException)
{
//
}
finally
{
await _dataServices.SaveAllJockeysAsync(Jockeys.ToList()); //saves everything to JSON file
//soing some stuff with UI props in here
}
}
As far as I understand from the articles, this is a kind of anti-pattern.
It is an anti-pattern. But if can't modify the service implementation, you should at least be able to execute the tasks in parallel. Something like this:
public async Task ScrapJockeys(int startIndex, int stopIndex, string dataType)
{
ConcurrentBag<Task> tasks = new ConcurrentBag<Task>();
ParallelOptions parallelLoopOptions = new ParallelOptions() { CancellationToken = CancellationToken };
Parallel.For(startIndex, stopIndex, parallelLoopOptions, i =>
{
int j = i;
switch (dataType)
{
case "jockeysPl":
tasks.Add(_scrapServices.ScrapSingleJockeyPlAsync(j));
break;
case "jockeysCz":
tasks.Add(_scrapServices.ScrapSingleJockeyCzAsync(j));
break;
}
});
try
{
await Task.WhenAll(tasks);
}
catch (OperationCanceledException)
{
//
}
finally
{
await _dataServices.SaveAllJockeysAsync(Jockeys.ToList()); //saves everything to JSON file
//soing some stuff with UI props in here
}
}
I was looking at someone sample code for async and noticed a few issues with the way it was implemented. Whilst looking at the code I wondered if it would be more efficient to loop through a list using as parallel, rather than just looping through the list normally.
As far as I can tell there is very little difference in performance, both use up every processor, and both talk around the same amount of time to completed.
This is the first way of doing it
var tasks= Client.GetClients().Select(async p => await p.Initialize());
And this is the second
var tasks = Client.GetClients().AsParallel().Select(async p => await p.Initialize());
Am I correct in assuming there is no difference between the two?
The full program can be found below
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
RunCode1();
Console.WriteLine("Here");
Console.ReadLine();
RunCode2();
Console.WriteLine("Here");
Console.ReadLine();
}
private async static void RunCode1()
{
Stopwatch myStopWatch = new Stopwatch();
myStopWatch.Start();
var tasks= Client.GetClients().Select(async p => await p.Initialize());
Task.WaitAll(tasks.ToArray());
Console.WriteLine("Time ellapsed(ms): " + myStopWatch.ElapsedMilliseconds);
myStopWatch.Stop();
}
private async static void RunCode2()
{
Stopwatch myStopWatch = new Stopwatch();
myStopWatch.Start();
var tasks = Client.GetClients().AsParallel().Select(async p => await p.Initialize());
Task.WaitAll(tasks.ToArray());
Console.WriteLine("Time ellapsed(ms): " + myStopWatch.ElapsedMilliseconds);
myStopWatch.Stop();
}
}
class Client
{
public static IEnumerable<Client> GetClients()
{
for (int i = 0; i < 100; i++)
{
yield return new Client() { Id = Guid.NewGuid() };
}
}
public Guid Id { get; set; }
//This method has to be called before you use a client
//For the sample, I don't put it on the constructor
public async Task Initialize()
{
await Task.Factory.StartNew(() =>
{
Stopwatch timer = new Stopwatch();
timer.Start();
while(timer.ElapsedMilliseconds<1000)
{}
timer.Stop();
});
Console.WriteLine("Completed: " + Id);
}
}
}
There should be very little discernible difference.
In your first case:
var tasks = Client.GetClients().Select(async p => await p.Initialize());
The executing thread will (one at a time) start executing Initialize for each element in the client list. Initialize immediately queues a method to the thread pool and returns an uncompleted Task.
In your second case:
var tasks = Client.GetClients().AsParallel().Select(async p => await p.Initialize());
The executing thread will fork to the thread pool and (in parallel) start executing Initialize for each element in the client list. Initialize has the same behavior: it immediately queues a method to the thread pool and returns.
The two timings are nearly identical because you're only parallelizing a small amount of code: the queueing of the method to the thread pool and the return of an uncompleted Task.
If Initialize did some longer (synchronous) work before its first await, it may make sense to use AsParallel.
Remember, all async methods (and lambdas) start out being executed synchronously (see the official FAQ or my own intro post).
There's a singular major difference.
In the following code, you are taking it upon yourself to perform the partitioning. In other words, you're creating one Task object per item from the IEnumerable<T> that is returned from the call to GetClients():
var tasks= Client.GetClients().Select(async p => await p.Initialize());
In the second, the call to AsParallel is internally going to use Task instances to execute partitions of the IEnumerable<T> and you're going to have the initial Task that is returned from the lambda async p => await p.Initialize():
var tasks = Client.GetClients().AsParallel().
Select(async p => await p.Initialize());
Finally, you're not really doing anything by using async/await here. Granted, the compiler might optimize this out, but you're just waiting on a method that returns a Task and then returning a continuation that does nothing back through the lambda. That said, since the call to Initialize is already returning a Task, it's best to keep it simple and just do:
var tasks = Client.GetClients().Select(p => p.Initialize());
Which will return the sequence of Task instances for you.
To improve on the above 2 answers this is the simplest way to get an async/threaded execution that is awaitable:
var results = await Task.WhenAll(Client.GetClients()
.Select(async p => p.Initialize()));
This will ensure that it spins separate threads and that you get the results at the end. Hope that helps someone. Took me quite a while to figure this out properly since this is very not obvious and the AsParallel() function seems to be what you want but doesn't use async/await.