Divide Db Query Result into as many tasks as I want - c#

I want to divide the Db Query Result into as many tasks as I want. How can I do? For example, I want to give every 300 rows to the same process at the same time, but every 300 rows must be different 300 rows.

I don't know what do you mean
I want to give every 300 rows to the same process at the same time
However, one possible solution for dividing the query result into a list of tasks could be this:
Count total records:
var count = await context.Entities.CountAsync();
Calculate the total database call you need:
const int take = 300;
var dbCallsCount = Math.Ceiling((double)count / take);
Create a method to fetch data (note that you cannot run parallel queries through the same DbContext object):
public async Task<List<Entity>> FetchDataAsync(int page, int take)
{
using(var context = new DbContext("ConnectionString"))
{
var result = await context.Entities
.AsNoTracking()
.Skip((page - 1) * take)
.Take(take)
.ToListAsync();
return result;
}
}
Create a List of tasks to fetch data:
var taskList = new List<Task<List<Entity>>>();
for(var i = 0; i < dbCallsCount; i++)
taskList.Add(FetchDataAsync(i, take));
var result = await Task.WhenAll(taskList);
It can be a generic method to get a list of tasks for fetching data:
public async Task<List<Task<List<TEntity>>>> DivideDbQueryIntoTasks<TEntity>(int take) where TEntity : class
{
int count;
using(var context = new DbContext("ConnectionString"))
{
count = await context.DbSet<TEntity>.CountAsync();
}
var dbCallsCount = Math.Ceiling((double)count / take);
// Local function
async Task<List<TEntity>> FetchDataAsync<TEntity>(int page, int take)
{
using(var context = new DbContext("ConnectionString"))
{
var result = await context.DbSet<TEntity>
.AsNoTracking()
.Skip((page - 1) * take)
.Take(take)
.ToListAsync();
return result;
}
}
var taskList = new List<Task<List<TEntity>>>();
for(var i = 0; i < dbCallsCount; i++)
taskList.Add(FetchDataAsync<TEntity>(i, take));
return taskList;
}
And call it in this way:
var tasks = await DivideDbQueryIntoTasks<MyEntity>(300);
foreach (Task<List<IdentityUser>> task in tasks)
{
...
}

Related

How do I run a method both parallel and sequentially in C#?

I have a C# console app. In this app, I have a method that I will call DoWorkAsync. For the context of this question, this method looks like this:
private async Task<string> DoWorkAsync()
{
System.Threading.Thread.Sleep(5000);
var random = new Random();
var chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
var length = random.Next(10, 101);
await Task.CompletedTask;
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}
I call DoWorkAsync from another method that determines a) how many times this will get ran and b) if each call will be ran in parallel or sequentially. That method looks like this:
private async Task<Task<string>[]> DoWork(int iterations, bool runInParallel)
{
var tasks = new List<Task<string>>();
for (var i=0; i<iterations; i++)
{
if (runInParallel)
{
var task = Task.Run(() => DoWorkAsync());
tasks.Add(task);
}
else
{
await DoWorkAsync();
}
}
return tasks.ToArray();
}
After all of the tasks are completed, I want to display the results. To do this, I have code that looks like this:
var random = new Random();
var tasks = await DoWork(random.Next(10, 101);
Task.WaitAll(tasks);
foreach (var task in tasks)
{
Console.WriteLine(task.Result);
}
This code works as expected if the code runs in parallel (i.e. runInParallel is true). However, when runInParallel is false (i.e. I want to run the Tasks sequentially) the Task array doesn't get populated. So, the caller doesn't have any results to work with. I don't know how to fix it though. I'm not sure how to add the method call as a Task that will run sequentially. I understand that the idea behind Tasks is to run in parallel. However, I have this need to toggle between parallel and sequential.
Thank you!
the Task array doesn't get populated.
So populate it:
else
{
var task = DoWorkAsync();
tasks.Add(task);
await task;
}
P.S.
Also your DoWorkAsync looks kinda wrong to me, why Thread.Sleep and not await Task.Delay (it is more correct way to simulate asynchronous execution, also you won't need await Task.CompletedTask this way). And if you expect DoWorkAsync to be CPU bound just make it like:
private Task<string> DoWorkAsync()
{
return Task.Run(() =>
{
// your cpu bound work
return "string";
});
}
After that you can do something like this (for both async/cpu bound work):
private async Task<string[]> DoWork(int iterations, bool runInParallel)
{
if(runInParallel)
{
var tasks = Enumerable.Range(0, iterations)
.Select(i => DoWorkAsync());
return await Task.WhenAll(tasks);
}
else
{
var result = new string[iterations];
for (var i = 0; i < iterations; i++)
{
result[i] = await DoWorkAsync();
}
return result;
}
}
Why is DoWorkAsync an async method?
It isn't currently doing anything asynchronous.
It seems that you are trying to utilise multiple threads to improve the performance of expensive CPU-bound work, so you would be better to make use of Parallel.For, which is designed for this purpose:
private string DoWork()
{
System.Threading.Thread.Sleep(5000);
var random = new Random();
var chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
var length = random.Next(10, 101);
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}
private string[] DoWork(int iterations, bool runInParallel)
{
var results = new string[iterations];
if (runInParallel)
{
Parallel.For(0, iterations - 1, i => results[i] = DoWork());
}
else
{
for (int i = 0; i < iterations; i++) results[i] = DoWork();
}
return results;
}
Then:
var random = new Random();
var serial = DoWork(random.Next(10, 101));
var parallel = DoWork(random.Next(10, 101), true);
I think you'd be better off doing the following:
Create a function that creates a (cold) list of tasks (or an array Task<string>[] for instance). No need to run them. Let's call this GetTasks()
var jobs = GetTasks();
Then, if you want to run them "sequentially", just do
var results = new List<string>();
foreach (var job in jobs)
{
var result = await job;
results.Add(result);
}
return results;
If you want to run them in parallel :
foreach (var job in jobs)
{
job.Start();
}
await results = Task.WhenAll(jobs);
Another note,
All this in itself should be a Task<string[]>, the Task<Task<... smells like a problem.

Single thread working and multithread not working

Below is my current code which gets 500 documents(JSON format) from the documentDB per call. I can only do 500 per search and adding it to a concurrent bag(in parallel). The data fetched is based on the id number I provide where to the API and picks it from that range. E.g. id = 500 [gets documents from 501 - 1000]. The below code fills concurrent bag with 25k documents as expected.
int threadNumber = 5;
var concurrentBag = new ConcurrentBag<docClass>();
if (batch == 25000)
{
id = 500;
while (id <= 25000)
{
docs = await client.SearchDocuments<docClass>(GetFollowUpRequest(id), requestOptions);
docClass lastdoc = docs.Documents.Last();
lastid = lastdoc.Id.Id;
Parallel.ForEach(docs.Documents, new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount }, item =>
{
concurrentBag.Add(item);
});
id = id + 500;
}
}
I wanted to run this whole while loop in threading so that I can do a multiple call to API and fetch 500 documents parallely. I tried to modify the code as below but always I see only 500 documents still in the concurrent bag 'concurrentBag' after the whole run and the skip id stays at 500 and doesnt increment.
int threadNumber = 5;
var concurrentBag = new ConcurrentBag<docClass>();
if (batch == 25000)
{
id = 500;
Task[] tasks = new Task[threadNumber];
for (int j = 0; j < threadNumber; j++)
{
tasks[j] = Task.Run(async() =>
{
while (id <= 25000)
{
docs = await client.SearchDocuments<docClass>(GetFollowUpRequest(id), requestOptions);
docClass lastdoc = docs.Documents.Last();
lastid = lastdoc.Id.Id;
Parallel.ForEach(docs.Documents, new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount }, item =>
{
concurrentBag.Add(item);
});
id = id + 500;
}
});
}
}
Can you please help what am I doing wrong here?
For loading document from external resources use asynchronous approach without extra threads.
Note, that when you download external resources in parallel, extra threads doing no work, but just waiting for the response, so threads are just being wasted ;)
Asynchronous approach provide possibility to launch multiple requests almost simultaneously, without waiting for every task to complete, but wait only when all tasks are ready.
var maxDocuments = 25000;
var step = 500;
var documentTasks = Enumerable.Range(1, int.Max)
.Select(offset => step * offset)
.TakeWhile(id => id <= maxDocuments)
.Select(id => client.Search<docClass>(GetFollowUpRequest(id), requestOptions))
.ToArray();
await Task.WhenAll(documentTasks);
var allDocuments = documentTasks
.Select(task = task.Result)
.SelectMany(documents => documents)
.ToArray();

LINQ IQueryable returning same rows with skip and take

Using MVC Entity Framework I'm calling a function with AJAX that passes in skip and take parameters.
[HttpGet]
public async Task<ActionResult> _ViewMore( int take, int skip)
{
var c = await GetContent( take, skip);
return View(c)
}
public async Task<List<PartialContent>> GetContentForCulture(int take, int skip)
{
return await ContextHelper.SearchContent(take, skip);
}
public static async Task<List<PartialContent>> SearchContent( int take, int skip)
{
try
{
using (var context = new Context())
{
var content = context.ContentEntities.SearchContent(take, skip);
var f = await content.Select(s => new PartialContent
{
Subtype = s.Subtype,
Id = s.Id,
MainImage = s.MainImage,
}).ToListAsync();
return f;
}
}
catch (Exception ex)
{
// Log.Err(ex.Message, ex);
return null;
}
}
public static IQueryable<T> SearchContent<T>(this IQueryable<T> source, int take, int skip)
where T : ContentEntity
{
source.Where(m => m.IsPublished ).OrderByDescending(m => m.DatePublished).Skip(skip).Take(take)
}
My issue is that every time I call the function the same rows are returned even though I debug and the skip value increments, and I have 100s of rows to fetch from.
The solution was to add another order by clause as suggested by Damien_The_Unbeliever
//pageSize is how many rows you want to skip every time and
//index is like page number first time index should be 0 then 1 every time index will be increased by one
var c = await context.ContentEntities.Skip(pageSize * index).Take(pageSize).ToListAsync();

Task.Run in a for loop

I have a for loop inside of which
First : I want to compute the SQL required to run
Second : Run the SQL asynchronously without waiting for them individually to finish in a loop
My code looks like:
for (
int i = 0;
i < gm.ListGroupMembershipUploadDetailsInput.GroupMembershipUploadInputList.Count;
i++)
{
// Compute
SQL.Upload.UploadDetails.insertGroupMembershipRecords(
gm.ListGroupMembershipUploadDetailsInput.GroupMembershipUploadInputList[i],max_seq_key++,max_unit_key++,
out strSPQuery,
out listParam);
//Run the out SPQuery async
Task.Run(() => rep.ExecuteStoredProcedureInputTypeAsync(strSPQuery, listParam));
}
The insertGroupMembershipRecords method in a separate DAL class looks like :
public static GroupMembershipUploadInput insertGroupMembershipRecords(GroupMembershipUploadInput gm, List<ChapterUploadFileDetailsHelper> ch, long max_seq_key, long max_unit_key, out string strSPQuery, out List<object> parameters)
{
GroupMembershipUploadInput gmHelper = new GroupMembershipUploadInput();
gmHelper = gm;
int com_unit_key = -1;
foreach(var item in com_unit_key_lst){
if (item.nk_ecode == gm.nk_ecode)
com_unit_key = item.unit_key;
}
int intNumberOfInputParameters = 42;
List<string> listOutputParameters = new List<string> { "o_outputMessage" };
strSPQuery = SPHelper.createSPQuery("dw_stuart_macs.strx_inst_cnst_grp_mbrshp", intNumberOfInputParameters, listOutputParameters);
var ParamObjects = new List<object>();
ParamObjects.Add(SPHelper.createTdParameter("i_seq_key", max_seq_key, "IN", TdType.BigInt, 10));
ParamObjects.Add(SPHelper.createTdParameter("i_chpt_cd", "S" + gm.appl_src_cd.Substring(1), "IN", TdType.VarChar, 4));
ParamObjects.Add(SPHelper.createTdParameter("i_nk_ecode", gm.nk_ecode, "IN", TdType.Char, 5));
// rest of the method
}
But in case of list Count of 2k which I tried,
It did not insert 2k records in DB but only 1.
Why this does not insert all the records the input list has ?
What am I missing ?
Task.Run in a for loop
Even though this is not the question, the title itself is what I'm going to address. For CPU bound operations you could use Parallel.For or Parallel.ForEach, but since we are IO bound (i.e.; database calls) we should rethink this approach.
The obvious answer here is to create a list of tasks that represent the asynchronous operations and then await them using the Task.WhenAll API like this:
public async Task InvokeAllTheSqlAsync()
{
var list = gm.ListGroupMembershipUploadDetailsInput.GroupMembershipUploadInputList;
var tasks = Enumerable.Range(0, list.Count).Select(i =>
{
var value = list[i];
string strSPQuery;
List<SqlParameter> listParam;
SQL.Upload.UploadDetails.insertGroupMembershipRecords(
value,
max_seq_key++,
max_unit_key++,
out strSPQuery,
out listParam
);
return rep.ExecuteStoredProcedureInputTypeAsync(strSPQuery, listParam);
});
await Task.WhenAll(tasks);
}

How to pass different range on parallel.for?

I need to process the single file in parallel by sending skip-take count like 1-1000, 1001-2000,2001-3000 etc
Code for parallel process
var line = File.ReadAllLines("D:\\OUTPUT.CSV").Length;
Parallel.For(1, line, new ParallelOptions { MaxDegreeOfParallelism = 10 }, x
=> {
DoSomething(skip,take);
});
Function
public static void DoSomething(int skip, int take)
{
//code here
}
How can send the skip and take count in parallel process as per my requirement ?
You can do these rather easily with PLINQ. If you want batches of 1000, you can do:
const int BatchSize = 1000;
var pageAmount = (int) Math.Ceiling(((float)lines / BatchSize));
var results = Enumerable.Range(0, pageAmount)
.AsParallel()
.Select(page => DoSomething(page));
public void DoSomething(int page)
{
var currentLines = source.Skip(page * BatchSize).Take(BatchSize);
// do something with the selected lines
}

Categories