How to invoke a list of Task<T> by using Parallel - c#

I've got a list of async calls that are lined up in specific order and it does not matter which one finishes first or last. All of these async Task returns Bitmaps. All of the async Task return a single Bitmap accept for one and it returns a list of Bitmaps List.
For testing purposes and me being able to get a better handle on the difference of using Parallel vs just Task I need someone to show me how to invoke each one of these async Task and set a local variable that contains a list of all the returned async results.
How to Parallel.ForEach of these task
How to retrieve the value of each completed task and set a local variable with the returned result.
---Code where I just await each Task one after another.
public async static Task<PdfSharp.Pdf.PdfDocument> RollUpDrawingsPDF(IElevation elevation)
{
List<Bitmap> allSheets = new List<Bitmap>();
var processedParts = new PartsProcessor.PartProcessor().ProcessParts(elevation);
//elevation
allSheets.Add(await ShopDrawing.Manager.GetElevationDrawing(elevation, true, RotateFlipType.Rotate90FlipNone));
//door schedules, 3 schedules per sheet
allSheets.AddRange(await ShopDrawing.Door.GetDoorSecheduleSheets(elevation, RotateFlipType.Rotate90FlipNone, 3));
//materials list
allSheets.Add(await MaterialsList.Manager.GetMaterialList(processedParts).GetDrawing());
//optimized parts
allSheets.Add(await Optimization.Manager.GetOptimizedParts(processedParts).GetDrawing());
//cut sheet
allSheets.Add(await CutSheet.Manager.GetCutSheet(processedParts).GetDrawing());
return await PDFMaker.PDFManager.GetPDF(allSheets, true);
}
------Code I'm tring to run in Parallel.ForEach however this isn't working but a starting place for help. For each returned task result I need to set the local variable of allSheets of that Parallel Task Result.
public async static Task<PdfSharp.Pdf.PdfDocument> RollUpDrawingsPDF(IElevation elevation)
{
List<Bitmap> allSheets = new List<Bitmap>();
var processedParts = new PartsProcessor.PartProcessor().ProcessParts(elevation);
Task[] myTask = new Task[5];
myTask[0] = ShopDrawing.Manager.GetElevationDrawing(elevation, true, RotateFlipType.Rotate90FlipNone);
myTask[1] = ShopDrawing.Door.GetDoorSecheduleSheets(elevation, RotateFlipType.Rotate90FlipNone, 3);
myTask[2] = MaterialsList.Manager.GetMaterialList(processedParts).GetDrawing();
myTask[3] = Optimization.Manager.GetOptimizedParts(processedParts).GetDrawing();
myTask[4] = CutSheet.Manager.GetCutSheet(processedParts).GetDrawing();
var x = Parallel.ForEach(myTask, t => t.Wait());
////elevation
//allSheets.Add(await );
////door schedules, 3 schedules per sheet
//allSheets.AddRange(await);
////materials list
//allSheets.Add(await );
////optimized parts
//allSheets.Add(await );
////cut sheet
//allSheets.Add(await );
return await PDFMaker.PDFManager.GetPDF(allSheets, true);
}
How would I implement the Parallel.ForEach for this body of code?
*Discussion code example. How to return a List when other methods return one Bitmap*
async Task<Bitmap[]> RollUpHelper(IElevation elevation, PartsProcessor.ProcessedParts processedParts)
{
return await Task<Bitmap[]>.WhenAll(
ShopDrawing.Manager.GetElevationDrawing(elevation, true, RotateFlipType.Rotate90FlipNone),
//ShopDrawing.Door.GetDoorSecheduleSheets(elevation,RotateFlipType.Rotate90FlipNone, 3),
MaterialsList.Manager.GetMaterialList(processedParts).GetDrawing(),
MaterialsList.Manager.GetMaterialList(processedParts).GetDrawing(),
CutSheet.Manager.GetCutSheet(processedParts).GetDrawing()
);
}

Parallel.ForEach() is for running multiple synchronous operations in parallel.
You want to wait for a number of asynchronous Tasks to finish:
await Task.WhenAll(tasks);

To expand upon SLaks' answer:
Task.WhenAll() will return an array of all the results returned by the tasks it waited for, so you don't need to manage that yourself.
Here's an example where I use string instead of Bitmap as in your example. Note how one of the workers doesn't return a List<string> and I convert it to a List<string> with one item, to make it the same type as the others.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
namespace Demo
{
class Data
{
public string Value;
public Data(string value) { Value = value; }
}
class Program
{
async Task<List<string>[]> RunAsync()
{
return await Task.WhenAll
(
Task.Factory.StartNew(() =>
new List<string> {Worker1(new Data("One"))}),
Task.Factory.StartNew(() =>
Worker2(new Data("Two"))),
Task.Factory.StartNew(() =>
Worker3(new Data("Three")))
);
}
void Run()
{
var results = RunAsync().Result;
// Now results is an array of List<string>, so we can iterate the results.
foreach (var result in results)
{
result.Print();
Console.WriteLine("--------------");
}
}
string Worker1(Data data)
{
Thread.Sleep(1000);
return data.Value;
}
List<string> Worker2(Data data)
{
Thread.Sleep(1500);
return Enumerable.Repeat(data.Value, 2).ToList();
}
List<string> Worker3(Data data)
{
Thread.Sleep(2000);
return Enumerable.Repeat(data.Value, 3).ToList();
}
static void Main()
{
new Program().Run();
}
}
static class DemoUtil
{
public static void Print(this object self)
{
Console.WriteLine(self);
}
public static void Print(this string self)
{
Console.WriteLine(self);
}
public static void Print<T>(this IEnumerable<T> self)
{
foreach (var item in self)
Console.WriteLine(item);
}
}
}

Related

Multiple tasks returns incorrect result

What I need to do
I need to start different instances of a class in a synchronous context using an async method.
Application structure
In my console application I've declared a List<Bot> class:
private List<Bot> _bots = new List<Bot>(new Bot[10]);
the class Bot contains some methods that takes data from internet, so these methods need to be waited. The method structure looks like this:
public class Bot
{
Competition Comp { get; set; }
public async Task StartAsync(int instance)
{
string url = "";
//based on the instance I take the data from different source.
switch(instance)
{
case 0:
url = "www.google.com";
break;
case 1:
url = "www.bing.com";
break;
}
//Comp property contains different groups.
Comp.Groups = await GetCompetitionAsync(Comp, url);
if(Comp.Groups.Count > 0)
{
foreach(var gp in group)
{
//add data inside database.
}
}
}
}
the Competition class have the following design:
public class Competition
{
public string Name { get; set; }
public List<string> Groups { get; set; }
}
I start all the instances of Bot class using the following code:
for(int i = 0; i < _bots.Count - 1; i++)
{
_bots[i].StartAsync(i);
}
this code will call different times StartAsync of Bot class, in this way, I can manage each instance of the bot, and I can eventually stop or start a specific instance in a separate method.
The problem
The method GetCompetitionAsync create a List<string>:
public async Task<List<string>> GetCompetitionAsync(Competition comp, string url)
{
if(comp == null)
comp = new Competition();
List<string> groups = new List<string();
using (var httpResonse = await httpClient.GetAsync(url))
{
string content = await httpResponse.Content.ReadAsStringAsync();
//fill list groups
}
return groups;
}
essentially this method will fill the List<string> available in Comp. Now, if I execute a single instance of StartAsync all works well, but when I run multiple instances (as the for above), the Comp object (which contains the Competition) have all the properties NULL.
So seems that when I have multiple Task running the synchronous context doesn't wait the async context, which in this case fill the List<string>.
When the code reach this line: if(Competition.Groups.Count > 0) I get a NULL exception, because Groups is null, and other Comp properties are NULL.
How can I manage this situation?
UPDATE
After other attempts, I though to create a List<Task> instead of a List<Bot>:
List<Task> tasks = new List<Task>(new Task[10]);
then instead of:
for(int i = 0; i < _bots.Count - 1; i++)
{
_bots[i].StartAsync(i);
}
I did:
for (int i = 0; i < tasks.Count - 1; i++)
{
Console.WriteLine("Starting " + i);
if (tasks[i] == null)
tasks[i] = new Task(async () => await new Bot().StartAsync(i));
apparently all is working well, I got no errors. The problem is: why? I though to something like a deadlock, that I can't even solve using ConfigureAwait(false);.
The last solution also doesn't allow me to access to Bot method because is now a Task.
UPDATE 2
Okay maybe I gotcha the issue. Essentially the await inside the asynchronous method StartAsync is trying to comeback on the main thread, meanwhile the main thread is busy waiting the task to complete, and this will create a deadlock.
That's why moving the StartAsync() inside a List<Task> has worked, 'cause now the async call is now running on a thread pool thread, it doesn't try to comeback to the main thread, and everything seems to works. But I can't use this solution for the reasons explained above.
I'm prefer use Threads instead of Tasks. IMHO, Threads more simple for understanding.
Note: seems that property Bot.Comp in your code is NOT initialized! I fix this issue.
My version of your code:
public class Bot
{
Competition Comp { get; set; }
System.Thread _thread;
private int _instance;
public Bot()
{
Comp = new Competition ();
}
public void Start(int instance)
{
_instance = instance;
_thread = new Thread(StartAsync);
_thread.Start();
}
private void StartAsync()
{
string url = "";
//based on the instance I take the data from different source.
switch(_instance)
{
case 0:
url = "www.google.com";
break;
case 1:
url = "www.bing.com";
break;
}
//Comp property contains different groups.
GetCompetitionAsync(Comp, url);
if(Comp.Groups.Count > 0)
{
foreach(var gp in group)
{
//add data inside database.
}
}
}
public List<string> GetCompetitionAsync(Competition comp, string url)
{
if(comp.groups == null) comp.groups = new List<string>();
using (var httpResonse = httpClient.GetAsync(url))
{
string content = await httpResponse.Content.ReadAsStringAsync();
//fill list groups
}
return groups;
}
}
Then we run threads:
for(int i = 0; i < _bots.Count - 1; i++)
{
_bots[i].Start(i);
}
Each instance of Bot starts method private void StartAsync() in it's own thread.
Note a implementation of method Bot.Start():
public void Start(int instance)
{
_instance = instance;
_thread = new Thread(StartAsync); //At this line: set method Bot.StartAsync as entry point for new thread.
_thread.Start();//At this line: call of _thread.Start() starts new thread and returns **immediately**.
}
This sort of thing is far simpler if you think in terms of lists and "pure" functions-- functions that accept input and return output. Don't pass in something for them to fill or mutate.
For example, this function accepts a string and returns the groups:
List<string> ExtractGroups(string content)
{
var list = new List<string>();
//Populate list
return list;
}
This function accepts a URL and returns its groups.
async Task<List<string>> GetCompetitionAsync(string url)
{
using (var httpResponse = await httpClient.GetAsync(url))
{
string content = await httpResponse.Content.ReadAsStringAsync();
return ExtractGroups(content);
}
}
And this function accepts a list of URLs and returns all of the groups as one list.
async Task<List<string>> GetAllGroups(string[] urls)
{
var tasks = urls.Select( u => GetCompetitionAsync(u) );
await Task.WhenAll(tasks);
return tasks.SelectMany( t => t.Result );
}
You can then stuff the data into the database as you had planned.
var groups = GetAllGroups( new string[] { "www.google.com", "www.bing.com" } );
foreach(var gp in groups)
{
//add data inside database.
}
See how much simpler it is when you break it down this way?

How to set a Dynamic Parallel Task Execution instead of manual call dispatch?

What is the best approach in this scenario, where in there are multiple tasks that need to be run, depending on a given parameter. See the code below:
void Mapping()
{
if(param.IsProgram1) {
// spGetProgram1() is a stored procedure call
MapProgram1(context, spGetProgram1().GetIterator());
if(param.IsProgram2) {
MapProgram2(context, spGetProgram2().GetIterator());
}
}
if(param.IsProgram3) {
MapProgram3(context, spGetProgram23().GetIteratior());
}
}
static void MapProgram1(context, IEnumerable<IDataRecord> records) {
// map records to context
}
static void MapProgram2(context, IEnumerable<IDataRecord> records) {
// map records to context
}
static void MapProgram3(context, IEnumerable<IDataRecord> records) {
// map records to context
}
I want to refactor this and run the tasks in parallel. My current approach is that I store the task in a dictionary and invoke it depending on the given parameter
var tasks = new Dictionary<string, Task<IEnumerable<IDataRecord>>>()
{
{ "MapProgram1", null },
{ "MapProgram2", null },
{ "MapProgram3", null },
}
if(param.IsProgram1) tasks["MapProgram1"].Value = Task.Run(() => spGetProgram1().GetIterator())
... ...
tasks.WaitAll(tasks.Select(t => t.Value).ToArray());
And then fetch the result and call the corresponding mapping method like this
foreach(var t in tasks.Where(t => t.Value != null))
{
if(t.Key == "MapProgram1")
{
MapProgram1(context, t.Value.Result);
}
if (t.Key == "MapProgram2")
{
MapProgram2(context, t.Value.Result);
}
.....
}
I'm sure there's a cleaner approach on this, that I don't need to manually call the methods.
I'm not sure I followed your example; however, in general if you want to execute tasks in parallel and conditionally choose which tasks to execute, you might do something like this...
async Task Run() {
List<Task<IResult>> tasks = new List<Task<IResult>>();
if (someCondition) {
tasks.Add(RunSome(someParams));
}
if (othercondition) {
tasks.Add(RunOther(otherParam));
}
IResult[] results = await Task.WhenAll(tasks);
foreach (var result in results) {
if (result is SomeResult someResult) {
// Handle some result
}
else if (result is OtherResult otherResult) {
// Handle other result
}
}
}
static async Task<SomeResult> RunSome(someParams) {
// Run something
}
static async Task<OtherResult> RunOther(otherParams) {
// Run other thing
}
interface IResult {
}
class SomeResult : IResult {
}
class OtherResult: IResult {
}
You add the tasks you want to execute in parallel to a list, have each task return an instance of some result type, and then check the results after all tasks are completed.
You can do a lot with Parallel.ForEach i.e adjust the degree of parallelism, ect
However it does depend on your work load and what you are trying to achieve, however this may give you food for thought
var list = new List<Stuff>();
Parallel.ForEach(list, (item) =>
{
switch(item,ConditionType)
{
case ConditionType.First : DoSomethingWithItem(item); break;
case ConditionType.Second : DoSomethingElseWithItem(item); break;
}
});
More resources
Parallel.ForEach Method
Msdn : How to: Write a Simple Parallel.ForEach Loop

C# Multi-threading, wait for all task to complete in a situation when new tasks are being constantly added

I have a situation where new tasks are being constantly generated and added to a ConcurrentBag<Tasks>.
I need to wait all tasks to complete.
Waiting for all the tasks in the ConcurrentBag via WaitAll is not enough as the number of tasks would have grown while the previous wait is completed.
At the moment I am waiting it in the following way:
private void WaitAllTasks()
{
while (true)
{
int countAtStart = _tasks.Count();
Task.WaitAll(_tasks.ToArray());
int countAtEnd = _tasks.Count();
if (countAtStart == countAtEnd)
{
break;
}
#if DEBUG
if (_tasks.Count() > 100)
{
tokenSource.Cancel();
break;
}
#endif
}
}
I am not very happy with the while(true) solution.
Can anyone suggest a better more efficient way to do this (without having to pool the processor constantly with a while(true))
Additional context information as requested in the comments. I don't think though this is relevant to the question.
This piece of code is used in a web crawler. The crawler scans page content and looks for two type of information. Data Pages and Link Pages. Data pages will be scanned and data will be collected, Link Pages will be scanned and more links will be collected from them.
As each of the tasks carry-on the activities and find more links, they add the links to an EventList. There is an event OnAdd on the list (code below) that is used to trigger other task to scan the newly added URLs. And so forth.
The job is complete when there are no more running tasks (so no more links will be added) and all items have been processed.
public IEventList<ISearchStatus> CurrentLinks { get; private set; }
public IEventList<IDataStatus> CurrentData { get; private set; }
public IEventList<System.Dynamic.ExpandoObject> ResultData { get; set; }
private readonly ConcurrentBag<Task> _tasks = new ConcurrentBag<Task>();
private readonly CancellationTokenSource tokenSource = new CancellationTokenSource();
private readonly CancellationToken token;
public void Search(ISearchDefinition search)
{
CurrentLinks.OnAdd += UrlAdded;
CurrentData.OnAdd += DataUrlAdded;
var status = new SearchStatus(search);
CurrentLinks.Add(status);
WaitAllTasks();
_exporter.Export(ResultData as IList<System.Dynamic.ExpandoObject>);
}
private void DataUrlAdded(object o, EventArgs e)
{
var item = o as IDataStatus;
if (item == null)
{
return;
}
_tasks.Add(Task.Factory.StartNew(() => ProcessObjectSearch(item), token));
}
private void UrlAdded(object o, EventArgs e)
{
var item = o as ISearchStatus;
if (item==null)
{
return;
}
_tasks.Add(Task.Factory.StartNew(() => ProcessFollow(item), token));
_tasks.Add(Task.Factory.StartNew(() => ProcessData(item), token));
}
public class EventList<T> : List<T>, IEventList<T>
{
public EventHandler OnAdd { get; set; }
private readonly object locker = new object();
public new void Add(T item)
{
//lock (locker)
{
base.Add(item);
}
OnAdd?.Invoke(item, null);
}
public new bool Contains(T item)
{
//lock (locker)
{
return base.Contains(item);
}
}
}
I think that this task can be done with TPL Dataflow library with very basic setup. You'll need a TransformManyBlock<Task, IEnumerable<DataTask>> and an ActionBlock (may be more of them) for actual data processing, like this:
// queue for a new urls to parse
var buffer = new BufferBlock<ParseTask>();
// parser itself, returns many data tasks from one url
// similar to LINQ.SelectMany method
var transform = new TransformManyBlock<ParseTask, DataTask>(task =>
{
// get all the additional urls to parse
var parsedLinks = GetLinkTasks(task);
// get all the data to parse
var parsedData = GetDataTasks(task);
// setup additional links to be parsed
foreach (var parsedLink in parsedLinks)
{
buffer.Post(parsedLink);
}
// return all the data to be processed
return parsedData;
});
// actual data processing
var consumer = new ActionBlock<DataTask>(s => ProcessData(s));
After that you need to link the blocks between each over:
buffer.LinkTo(transform, new DataflowLinkOptions { PropagateCompletion = true });
transform.LinkTo(consumer, new DataflowLinkOptions { PropagateCompletion = true });
Now you have a nice pipeline which will execute in background. At the moment you realize that everything you need is parsed, you simply call the Complete method for a block so it stops accepting news messages. After the buffer became empty, it will propagate the completion down the pipeline to transform block, which will propagate it down to consumer(s), and you need to wait for Completion task:
// no additional links would be accepted
buffer.Complete();
// after all the tasks are done, this will get fired
await consumer.Completion;
You can check the moment for a completion, for example, if both buffer' Count property and transform' InputCount and transform' CurrentDegreeOfParallelism (this is internal property for the TransformManyBlock) are equal to 0.
However, I suggested you to implement some additional logic here to determine current transformers number, as using the internal logic isn't a great solution. As for cancelling the pipeline, you can create a TPL block with a CancellationToken, either the one for all, or a dedicated for each block, getting the cancellation out of box.
Why not write one function that yields your tasks as necessary, when they are created? This way you can just use Task.WhenAll to wait for them to complete or, have I missed the point? See this working here.
using System;
using System.Threading.Tasks;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
try
{
Task.WhenAll(GetLazilyGeneratedSequenceOfTasks()).Wait();
Console.WriteLine("Fisnished.");
}
catch (Exception ex)
{
Console.WriteLine(ex);
}
}
public static IEnumerable<Task> GetLazilyGeneratedSequenceOfTasks()
{
var random = new Random();
var finished = false;
while (!finished)
{
var n = random.Next(1, 2001);
if (n < 50)
{
finished = true;
}
if (n > 499)
{
yield return Task.Delay(n);
}
Task.Delay(20).Wait();
}
yield break;
}
}
Alternatively, if your question is not as trivial as my answer may suggest, I'd consider a mesh with TPL Dataflow. The combination of a BufferBlock and an ActionBlock would get you very close to what you need. You could start here.
Either way, I'd suggest you want to include a provision for accepting a CancellationToken or two.

Task <T> retaining starting parameters

When creating a Task, is it possible to record the parameters that were used to start the task.
Take to following as an example (just a prototype, it's not real!).
static void Main(string[] args)
{
ICollection<Task<int>> taskList = new List<Task<int>>();
// Create a set of tasks
for (int i = 1; i <= 10; i++)
{
var local_i = i; // Local scoped variable
Task<int> t = new Task<int>(() =>
{
return myFunc(local_i);
});
t.Start();
taskList.Add(t);
}
// Wait for all the tasks to complete.
Task.WaitAll(taskList.ToArray());
// Output the results
foreach (var tsk in taskList)
{
// the "???" should be the input value to the task
System.Diagnostics.Debug.WriteLine("Input: ??? - Result: "+tsk.Result);
}
}
static int myFunc(int i)
{
return (i * i);
}
When the results are output, I want to know what input variable was provided to myFunc() that produced the result
Besides returning a Tuple with both values, you can also make taskList an ICollection<Tuple<int, Task<int>>> and store the parameter there. To make it simpler, you might create your own class for that:
class TaskInfo<T>
{
public Task<T> Task { get; set; }
public T Parameter { get; set; }
}
And then
var taskList = new List<TaskInfo<int>>();
...
taskList.Add(new TaskInfo { Task = t, Parameter = local_i });
If you can change myFunc change the return type, so it will return the input and result as a Tuple.
If you can't you could use a Dictionary or List<Tuple<input,Task>> to store the input along with the task (instead of your ICollection)

Issue with extension method

Code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Collections;
namespace LambdaExtensionEx
{
class Program
{
static void Main(string[] args)
{
string[] myStrngs = new string[] { "Super","Simple","Okay"};
IEnumerable<string> criteraStrngs = myStrngs.WhereSearch<string>(delegate(string s)
{
return s.StartsWith("s");
});
string s1 = "dotnet";
int count = s1.GetCount();
Console.ReadLine();
}
}
public static class MyExtension
{
public delegate bool Criteria<T>(T Value);
public static IEnumerable<T> WhereSearch<T>(this IEnumerable<T> values, Criteria<T> critera)
{
foreach (T value in values)
if (critera(value))
yield return value;
}
public static int GetCount(this string value)
{
return value.Count();
}
}
}
I am able to call GetCount extension method and I get result in 'count'. But WhereSearch is not being called in any time and not getting result. What mistake am I doing?
You need to start enumerating over the result returned by the WhereSearch function if you want it to get executed. The reason for that is because this function yield returns an IEnumerable<T>. What the C# compiler does is build a state machine and doesn't execute the function immediately until the calling code starts enumerating over the result.
For example:
// The function is not executed at this stage because this function uses
// yield return to build an IEnumerable<T>
IEnumerable<string> criteraStrngs = myStrngs.WhereSearch<string>(delegate(string s)
{
return s.StartsWith("s");
});
// Here we start enumerating over the result => the code will start executing
// the function.
foreach (var item in criteraStrngs)
{
Console.WriteLine(item);
}
Another example is calling some of the LINQ extension methods such as .ToList() on the result which will actually enumerate and call the function:
IEnumerable<string> criteraStrngs = myStrngs.WhereSearch<string>(delegate(string s)
{
return s.StartsWith("s");
})
.ToList();
For more details on how lazy loading works in this case you may take a look at the following post.
Your extension methods class doesn't make much sense - why don't you just do this?
class Program
{
static void Main(string[] args)
{
string[] myStrngs = new string[] { "Super", "Simple", "Okay" };
IEnumerable<string> criteraStrngs = myStrngs.Where(x => x.StartsWith("s"));
string s1 = "dotnet";
int count = s1.Count();
Console.ReadLine();
}
}
As stated by Darin Dimitrov previously - you will still need to enumerate criteriaStrngs to get a result out of it - which is what was wrong with your original code.
HTH

Categories