Task.Run in a for loop - c#

I have a for loop inside of which
First : I want to compute the SQL required to run
Second : Run the SQL asynchronously without waiting for them individually to finish in a loop
My code looks like:
for (
int i = 0;
i < gm.ListGroupMembershipUploadDetailsInput.GroupMembershipUploadInputList.Count;
i++)
{
// Compute
SQL.Upload.UploadDetails.insertGroupMembershipRecords(
gm.ListGroupMembershipUploadDetailsInput.GroupMembershipUploadInputList[i],max_seq_key++,max_unit_key++,
out strSPQuery,
out listParam);
//Run the out SPQuery async
Task.Run(() => rep.ExecuteStoredProcedureInputTypeAsync(strSPQuery, listParam));
}
The insertGroupMembershipRecords method in a separate DAL class looks like :
public static GroupMembershipUploadInput insertGroupMembershipRecords(GroupMembershipUploadInput gm, List<ChapterUploadFileDetailsHelper> ch, long max_seq_key, long max_unit_key, out string strSPQuery, out List<object> parameters)
{
GroupMembershipUploadInput gmHelper = new GroupMembershipUploadInput();
gmHelper = gm;
int com_unit_key = -1;
foreach(var item in com_unit_key_lst){
if (item.nk_ecode == gm.nk_ecode)
com_unit_key = item.unit_key;
}
int intNumberOfInputParameters = 42;
List<string> listOutputParameters = new List<string> { "o_outputMessage" };
strSPQuery = SPHelper.createSPQuery("dw_stuart_macs.strx_inst_cnst_grp_mbrshp", intNumberOfInputParameters, listOutputParameters);
var ParamObjects = new List<object>();
ParamObjects.Add(SPHelper.createTdParameter("i_seq_key", max_seq_key, "IN", TdType.BigInt, 10));
ParamObjects.Add(SPHelper.createTdParameter("i_chpt_cd", "S" + gm.appl_src_cd.Substring(1), "IN", TdType.VarChar, 4));
ParamObjects.Add(SPHelper.createTdParameter("i_nk_ecode", gm.nk_ecode, "IN", TdType.Char, 5));
// rest of the method
}
But in case of list Count of 2k which I tried,
It did not insert 2k records in DB but only 1.
Why this does not insert all the records the input list has ?
What am I missing ?

Task.Run in a for loop
Even though this is not the question, the title itself is what I'm going to address. For CPU bound operations you could use Parallel.For or Parallel.ForEach, but since we are IO bound (i.e.; database calls) we should rethink this approach.
The obvious answer here is to create a list of tasks that represent the asynchronous operations and then await them using the Task.WhenAll API like this:
public async Task InvokeAllTheSqlAsync()
{
var list = gm.ListGroupMembershipUploadDetailsInput.GroupMembershipUploadInputList;
var tasks = Enumerable.Range(0, list.Count).Select(i =>
{
var value = list[i];
string strSPQuery;
List<SqlParameter> listParam;
SQL.Upload.UploadDetails.insertGroupMembershipRecords(
value,
max_seq_key++,
max_unit_key++,
out strSPQuery,
out listParam
);
return rep.ExecuteStoredProcedureInputTypeAsync(strSPQuery, listParam);
});
await Task.WhenAll(tasks);
}

Related

Divide Db Query Result into as many tasks as I want

I want to divide the Db Query Result into as many tasks as I want. How can I do? For example, I want to give every 300 rows to the same process at the same time, but every 300 rows must be different 300 rows.
I don't know what do you mean
I want to give every 300 rows to the same process at the same time
However, one possible solution for dividing the query result into a list of tasks could be this:
Count total records:
var count = await context.Entities.CountAsync();
Calculate the total database call you need:
const int take = 300;
var dbCallsCount = Math.Ceiling((double)count / take);
Create a method to fetch data (note that you cannot run parallel queries through the same DbContext object):
public async Task<List<Entity>> FetchDataAsync(int page, int take)
{
using(var context = new DbContext("ConnectionString"))
{
var result = await context.Entities
.AsNoTracking()
.Skip((page - 1) * take)
.Take(take)
.ToListAsync();
return result;
}
}
Create a List of tasks to fetch data:
var taskList = new List<Task<List<Entity>>>();
for(var i = 0; i < dbCallsCount; i++)
taskList.Add(FetchDataAsync(i, take));
var result = await Task.WhenAll(taskList);
It can be a generic method to get a list of tasks for fetching data:
public async Task<List<Task<List<TEntity>>>> DivideDbQueryIntoTasks<TEntity>(int take) where TEntity : class
{
int count;
using(var context = new DbContext("ConnectionString"))
{
count = await context.DbSet<TEntity>.CountAsync();
}
var dbCallsCount = Math.Ceiling((double)count / take);
// Local function
async Task<List<TEntity>> FetchDataAsync<TEntity>(int page, int take)
{
using(var context = new DbContext("ConnectionString"))
{
var result = await context.DbSet<TEntity>
.AsNoTracking()
.Skip((page - 1) * take)
.Take(take)
.ToListAsync();
return result;
}
}
var taskList = new List<Task<List<TEntity>>>();
for(var i = 0; i < dbCallsCount; i++)
taskList.Add(FetchDataAsync<TEntity>(i, take));
return taskList;
}
And call it in this way:
var tasks = await DivideDbQueryIntoTasks<MyEntity>(300);
foreach (Task<List<IdentityUser>> task in tasks)
{
...
}

How do I run a method both parallel and sequentially in C#?

I have a C# console app. In this app, I have a method that I will call DoWorkAsync. For the context of this question, this method looks like this:
private async Task<string> DoWorkAsync()
{
System.Threading.Thread.Sleep(5000);
var random = new Random();
var chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
var length = random.Next(10, 101);
await Task.CompletedTask;
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}
I call DoWorkAsync from another method that determines a) how many times this will get ran and b) if each call will be ran in parallel or sequentially. That method looks like this:
private async Task<Task<string>[]> DoWork(int iterations, bool runInParallel)
{
var tasks = new List<Task<string>>();
for (var i=0; i<iterations; i++)
{
if (runInParallel)
{
var task = Task.Run(() => DoWorkAsync());
tasks.Add(task);
}
else
{
await DoWorkAsync();
}
}
return tasks.ToArray();
}
After all of the tasks are completed, I want to display the results. To do this, I have code that looks like this:
var random = new Random();
var tasks = await DoWork(random.Next(10, 101);
Task.WaitAll(tasks);
foreach (var task in tasks)
{
Console.WriteLine(task.Result);
}
This code works as expected if the code runs in parallel (i.e. runInParallel is true). However, when runInParallel is false (i.e. I want to run the Tasks sequentially) the Task array doesn't get populated. So, the caller doesn't have any results to work with. I don't know how to fix it though. I'm not sure how to add the method call as a Task that will run sequentially. I understand that the idea behind Tasks is to run in parallel. However, I have this need to toggle between parallel and sequential.
Thank you!
the Task array doesn't get populated.
So populate it:
else
{
var task = DoWorkAsync();
tasks.Add(task);
await task;
}
P.S.
Also your DoWorkAsync looks kinda wrong to me, why Thread.Sleep and not await Task.Delay (it is more correct way to simulate asynchronous execution, also you won't need await Task.CompletedTask this way). And if you expect DoWorkAsync to be CPU bound just make it like:
private Task<string> DoWorkAsync()
{
return Task.Run(() =>
{
// your cpu bound work
return "string";
});
}
After that you can do something like this (for both async/cpu bound work):
private async Task<string[]> DoWork(int iterations, bool runInParallel)
{
if(runInParallel)
{
var tasks = Enumerable.Range(0, iterations)
.Select(i => DoWorkAsync());
return await Task.WhenAll(tasks);
}
else
{
var result = new string[iterations];
for (var i = 0; i < iterations; i++)
{
result[i] = await DoWorkAsync();
}
return result;
}
}
Why is DoWorkAsync an async method?
It isn't currently doing anything asynchronous.
It seems that you are trying to utilise multiple threads to improve the performance of expensive CPU-bound work, so you would be better to make use of Parallel.For, which is designed for this purpose:
private string DoWork()
{
System.Threading.Thread.Sleep(5000);
var random = new Random();
var chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
var length = random.Next(10, 101);
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}
private string[] DoWork(int iterations, bool runInParallel)
{
var results = new string[iterations];
if (runInParallel)
{
Parallel.For(0, iterations - 1, i => results[i] = DoWork());
}
else
{
for (int i = 0; i < iterations; i++) results[i] = DoWork();
}
return results;
}
Then:
var random = new Random();
var serial = DoWork(random.Next(10, 101));
var parallel = DoWork(random.Next(10, 101), true);
I think you'd be better off doing the following:
Create a function that creates a (cold) list of tasks (or an array Task<string>[] for instance). No need to run them. Let's call this GetTasks()
var jobs = GetTasks();
Then, if you want to run them "sequentially", just do
var results = new List<string>();
foreach (var job in jobs)
{
var result = await job;
results.Add(result);
}
return results;
If you want to run them in parallel :
foreach (var job in jobs)
{
job.Start();
}
await results = Task.WhenAll(jobs);
Another note,
All this in itself should be a Task<string[]>, the Task<Task<... smells like a problem.

How to increase perfomance for loop using c#

I compare task data from Microsoft project using a nested for loop. But since the project has many records (more than 1000), it is very slow.
How do I improve the performance?
for (int n = 1; n < thisProject.Tasks.Count; n++)
{
string abc = thisProject.Tasks[n].Name;
string def = thisProject.Tasks[n].ResourceNames;
for (int l = thisProject.Tasks.Count; l > n; l--)
{
// MessageBox.Show(thisProject.Tasks[l].Name);
if (abc == thisProject.Tasks[l].Name && def == thisProject.Tasks[l].ResourceNames)
{
thisProject.Tasks[l].Delete();
}
}
}
As you notice, I am comparing the Name and ResourceNames on the individual Task and when I find a duplicate, I call Task.Delete to get rid of the duplicate
A hash check should be lot faster in this case then nested-looping i.e. O(n) vs O(n^2)
First, provide a equality comparer of your own
class TaskComparer : IEqualityComparer<Task> {
public bool Equals(Task x, Task y) {
if (ReferenceEquals(x, y)) return true;
if (ReferenceEquals(x, null)) return false;
if (ReferenceEquals(y, null)) return false;
if (x.GetType() != y.GetType()) return false;
return string.Equals(x.Name, y.Name) && string.Equals(x.ResourceNames, y.ResourceNames);
}
public int GetHashCode(Task task) {
unchecked {
return
((task?.Name?.GetHashCode() ?? 0) * 397) ^
(task?.ResourceNames?.GetHashCode() ?? 0);
}
}
}
Don't worry too much about the GetHashCode function implementation; this is just a broiler-plate code which composes a unique hash-code from its properties
Now you have this class for comparison and hashing, you can use the below code to remove your dupes
var set = new HashSet<Task>(new TaskComparer());
for (int i = thisProject.Tasks.Count - 1; i >= 0; --i) {
if (!set.Add(thisProject.Tasks[i]))
thisProject.Tasks[i].Delete();
}
As you notice, you are simply scanning all your elements, while storing them into a HashSet. This HashSet will check, based on our equality comparer, if the provided element is a duplicate or not.
Now, since you want to delete it, the detected dupes are deleted. You can modify this code to simply extract the Unique items instead of deleting the dupes, by reversing the condition to if (set.Add(thisProject.Tasks[i])) and processing within this if
Microsoft Project has a Sort method which makes simple work of this problem. Sort the tasks by Name, Resource Names, and Unique ID and then loop through comparing adjacent tasks and delete duplicates. By using Unique ID as the third sort key you can be sure to delete the duplicate that was added later. Alternatively, you can use the task ID to remove tasks that are lower down in the schedule. Here's a VBA example of how to do this:
Sub RemoveDuplicateTasks()
Dim proj As Project
Set proj = ActiveProject
Application.Sort Key1:="Name", Ascending1:=True, Key2:="Resource Names", Ascending2:=True, Key3:="Unique ID", Ascending3:=True, Renumber:=False, Outline:=False
Application.SelectAll
Dim tsks As Tasks
Set tsks = Application.ActiveSelection.Tasks
Dim i As Integer
Do While i < tsks.Count
If tsks(i).Name = tsks(i + 1).Name And tsks(i).ResourceNames = tsks(i + 1).ResourceNames Then
tsks(i + 1).Delete
Else
i = i + 1
End If
Loop
Application.Sort Key1:="ID", Renumber:=False, Outline:=False
Application.SelectBeginning
End Sub
Note: This question relates to algorithm, not syntax; VBA is easy to translate to c#.
This should give you all the items which are duplicates, so you can delete them from your original list.
thisProject.Tasks.GroupBy(x => new { x.Name, x.ResourceNames}).Where(g => g.Count() > 1).SelectMany(g => g.Select(c => c));
Note that you probably do not want to remove all of them, only the duplicate versions, so be careful how you loop through this list.
A Linq way of getting distinct elements from your Tasks list :
public class Task
{
public string Name {get;set;}
public string ResourceName {get;set;}
}
public class Program
{
public static void Main()
{
List<Task> Tasks = new List<Task>();
Tasks.Add(new Task(){Name = "a",ResourceName = "ra"});
Tasks.Add(new Task(){Name = "b",ResourceName = "rb"});
Tasks.Add(new Task(){Name = "c",ResourceName = "rc"});
Tasks.Add(new Task(){Name = "a",ResourceName = "ra"});
Tasks.Add(new Task(){Name = "b",ResourceName = "rb"});
Tasks.Add(new Task(){Name = "c",ResourceName = "rc"});
Console.WriteLine("Initial List :");
foreach(var t in Tasks){
Console.WriteLine(t.Name);
}
// Here comes the interesting part
List<Task> Tasks2 = Tasks.GroupBy(x => new {x.Name, x.ResourceName})
.Select(g => g.First()).ToList();
Console.WriteLine("Final List :");
foreach(Task t in Tasks2){
Console.WriteLine(t.Name);
}
}
}
This selects every first elements having the same Name and ResourceName.
Run the example here.

TaskFactory, Starting a new Task when one ends

I have found many methods of using the TaskFactory but I could not find anything about starting more tasks and watching when one ends and starting another one.
I always want to have 10 tasks working.
I want something like this
int nTotalTasks=10;
int nCurrentTask=0;
Task<bool>[] tasks=new Task<bool>[nThreadsNum];
for (int i=0; i<1000; i++)
{
string param1="test";
string param2="test";
if (nCurrentTask<10) // if there are less than 10 tasks then start another one
tasks[nCurrentThread++] = Task.Factory.StartNew<bool>(() =>
{
MyClass cls = new MyClass();
bool bRet = cls.Method1(param1, param2, i); // takes up to 2 minutes to finish
return bRet;
});
// How can I stop the for loop until a new task is finished and start a new one?
}
Check out the Task.WaitAny method:
Waits for any of the provided Task objects to complete execution.
Example from the documentation:
var t1 = Task.Factory.StartNew(() => DoOperation1());
var t2 = Task.Factory.StartNew(() => DoOperation2());
Task.WaitAny(t1, t2)
I would use a combination of Microsoft's Reactive Framework (NuGet "Rx-Main") and TPL for this. It becomes very simple.
Here's the code:
int nTotalTasks=10;
string param1="test";
string param2="test";
IDisposable subscription =
Observable
.Range(0, 1000)
.Select(i => Observable.FromAsync(() => Task.Factory.StartNew<bool>(() =>
{
MyClass cls = new MyClass();
bool bRet = cls.Method1(param1, param2, i); // takes up to 2 minutes to finish
return bRet;
})))
.Merge(nTotalTasks)
.ToArray()
.Subscribe((bool[] results) =>
{
/* Do something with the results. */
});
The key part here is the .Merge(nTotalTasks) which limits the number of concurrent tasks.
If you need to stop the processing part way thru just call subscription.Dispose() and everything gets cleaned up for you.
If you want to process each result as they are produced you can change the code from the .Merge(...) like this:
.Merge(nTotalTasks)
.Subscribe((bool result) =>
{
/* Do something with each result. */
});
This should be all you need, not complete, but all you need to do is wait on the first to complete and then run the second.
Task.WaitAny(task to wait on);
Task.Factory.StartNew()
Have you seen the BlockingCollection class? It allows you to have multiple threads running in parallel and you can wait from results from one task to execute another. See more information here.
The answer depends on whether the tasks to be scheduled are CPU or I/O bound.
For CPU-intensive work I would use Parallel.For() API setting the number of thread/tasks through MaxDegreeOfParallelism property of ParallelOptions
For I/O bound work the number of concurrently executing tasks can be significantly larger than the number of available CPUs, so the strategy is to rely on async methods as much as possible, which reduces the total number of threads waiting for completion.
How can I stop the for loop until a new task is finished and start a
new one?
The loop can be throttled by using await:
static void Main(string[] args)
{
var task = DoWorkAsync();
task.Wait();
// handle results
// task.Result;
Console.WriteLine("Done.");
}
async static Task<bool> DoWorkAsync()
{
const int NUMBER_OF_SLOTS = 10;
string param1="test";
string param2="test";
var results = new bool[NUMBER_OF_SLOTS];
AsyncWorkScheduler ws = new AsyncWorkScheduler(NUMBER_OF_SLOTS);
for (int i = 0; i < 1000; ++i)
{
await ws.ScheduleAsync((slotNumber) => DoWorkAsync(i, slotNumber, param1, param2, results));
}
ws.Complete();
await ws.Completion;
}
async static Task DoWorkAsync(int index, int slotNumber, string param1, string param2, bool[] results)
{
results[slotNumber] = results[slotNumber} && await Task.Factory.StartNew<bool>(() =>
{
MyClass cls = new MyClass();
bool bRet = cls.Method1(param1, param2, i); // takes up to 2 minutes to finish
return bRet;
}));
}
A helper class AsyncWorkScheduler uses TPL.DataFlow components as well as Task.WhenAll():
class AsyncWorkScheduler
{
public AsyncWorkScheduler(int numberOfSlots)
{
m_slots = new Task[numberOfSlots];
m_availableSlots = new BufferBlock<int>();
m_errors = new List<Exception>();
m_tcs = new TaskCompletionSource<bool>();
m_completionPending = 0;
// Initial state: all slots are available
for(int i = 0; i < m_slots.Length; ++i)
{
m_slots[i] = Task.FromResult(false);
m_availableSlots.Post(i);
}
}
public async Task ScheduleAsync(Func<int, Task> action)
{
if (Volatile.Read(ref m_completionPending) != 0)
{
throw new InvalidOperationException("Unable to schedule new items.");
}
// Acquire a slot
int slotNumber = await m_availableSlots.ReceiveAsync().ConfigureAwait(false);
// Schedule a new task for a given slot
var task = action(slotNumber);
// Store a continuation on the task to handle completion events
m_slots[slotNumber] = task.ContinueWith(t => HandleCompletedTask(t, slotNumber), TaskContinuationOptions.ExecuteSynchronously);
}
public async void Complete()
{
if (Interlocked.CompareExchange(ref m_completionPending, 1, 0) != 0)
{
return;
}
// Signal the queue's completion
m_availableSlots.Complete();
await Task.WhenAll(m_slots).ConfigureAwait(false);
// Set completion
if (m_errors.Count != 0)
{
m_tcs.TrySetException(m_errors);
}
else
{
m_tcs.TrySetResult(true);
}
}
public Task Completion
{
get
{
return m_tcs.Task;
}
}
void SetFailed(Exception error)
{
lock(m_errors)
{
m_errors.Add(error);
}
}
void HandleCompletedTask(Task task, int slotNumber)
{
if (task.IsFaulted || task.IsCanceled)
{
SetFailed(task.Exception);
return;
}
if (Volatile.Read(ref m_completionPending) == 1)
{
return;
}
// Release a slot
m_availableSlots.Post(slotNumber);
}
int m_completionPending;
List<Exception> m_errors;
BufferBlock<int> m_availableSlots;
TaskCompletionSource<bool> m_tcs;
Task[] m_slots;
}

How to pass different range on parallel.for?

I need to process the single file in parallel by sending skip-take count like 1-1000, 1001-2000,2001-3000 etc
Code for parallel process
var line = File.ReadAllLines("D:\\OUTPUT.CSV").Length;
Parallel.For(1, line, new ParallelOptions { MaxDegreeOfParallelism = 10 }, x
=> {
DoSomething(skip,take);
});
Function
public static void DoSomething(int skip, int take)
{
//code here
}
How can send the skip and take count in parallel process as per my requirement ?
You can do these rather easily with PLINQ. If you want batches of 1000, you can do:
const int BatchSize = 1000;
var pageAmount = (int) Math.Ceiling(((float)lines / BatchSize));
var results = Enumerable.Range(0, pageAmount)
.AsParallel()
.Select(page => DoSomething(page));
public void DoSomething(int page)
{
var currentLines = source.Skip(page * BatchSize).Take(BatchSize);
// do something with the selected lines
}

Categories