async and await while adding elements to List<T> - c#

I wrote method, which adds elements to the List from many sources. See below:
public static async Task<List<SearchingItem>> GetItemsToSelect()
{
List<SearchingItem> searchingItems = new List<SearchingItem>();
foreach (Place place in await GetPlaces())
{
searchingItems.Add(new SearchingItem() {
IdFromRealModel=place.Id, NameToDisplay=place.FullName,
ExtraInformation=place.Name, TypeOfSearchingItem=TypeOfSearchingItem.PLACE });
}
foreach (Group group in await GetGroups())
{
searchingItems.Add(new SearchingItem()
{
IdFromRealModel = group.Id, NameToDisplay = group.Name,
ExtraInformation = group.TypeName, TypeOfSearchingItem = TypeOfSearchingItem.GROUP
});
}
return searchingItems;
}
I tested this method and works propertly. I suppose that it works propertly, because GetPlaces method return 160 elements and GetGroups return 3000. But, I was wondering if it will work if the methods return elements in the same time. Should I lock list searchingItems ?
Thank you for advice.

Your items do not run at the same time, you start GetPlaces(), stop and wait for GetPlaces() result, then go in to the first loop. You then start GetGroups(), stop and wait for GetGroups() result, then go in to the second loop. Your loops are not concurrent so you have no need to lock while adding them.
However if you have noticed your two async methods are also not concurrent, you can easily modify your program to make it so though.
public static async Task<List<SearchingItem>> GetItemsToSelect()
{
List<SearchingItem> searchingItems = new List<SearchingItem>();
var getPlacesTask = GetPlaces();
var getGroupsTask = GetGroups();
foreach (Place place in await getPlacesTask)
{
searchingItems.Add(new SearchingItem() {
IdFromRealModel=place.Id, NameToDisplay=place.FullName,
ExtraInformation=place.Name, TypeOfSearchingItem=TypeOfSearchingItem.PLACE });
}
foreach (Group group in await getGroupsTask)
{
searchingItems.Add(new SearchingItem()
{
IdFromRealModel = group.Id, NameToDisplay = group.Name,
ExtraInformation = group.TypeName, TypeOfSearchingItem = TypeOfSearchingItem.GROUP
});
}
return searchingItems;
}
What this will do will start GetPlaces(), start GetGroups(), stop and wait for GetPlaces() result, then go in to the first loop, stop and wait for GetGroups() result, then go in to the second loop.
The two loops are still not concurrent, but your two await-able methods are which may give you a small performance boost. I doubt you would get any benifit from making the loops concurrent, they appear to just be building models and the overhead of making it thread safe would not be worth it for how little work is being done.
If you really wanted to try and make it more parallel (but I doubt you will see much benefit) is use PLINQ to build your models.
public static async Task<List<SearchingItem>> GetItemsToSelect()
{
var getPlacesTask = GetPlaces();
var getGroupsTask = GetGroups();
var places = await getPlacesTask;
//Just make the initial list from the LINQ object.
List<SearchingItem> searchingItems = places.AsParallel().Select(place=>
new SearchingItem() {
IdFromRealModel=place.Id, NameToDisplay=place.FullName,
ExtraInformation=place.Name, TypeOfSearchingItem=TypeOfSearchingItem.PLACE
}).ToList();
var groups = await getGroupsTask;
//build up a PLINQ IEnumerable
var groupSearchItems = groups.AsParallel().Select(group=>
new SearchingItem()
{
IdFromRealModel = group.Id, NameToDisplay = group.Name,
ExtraInformation = group.TypeName, TypeOfSearchingItem = TypeOfSearchingItem.GROUP
});
//The building of the IEnumerable was parallel but the adding is serial.
searchingItems.AddRange(groupSearchItems);
return searchingItems;
}

Related

How does parallelization work on async/await?

I have the following code, that I intend to run asynchronously. My goal is that GetPictureForEmployeeAsync() is called in parallel as many times as needed. I'd like to make sure that 'await' on CreatePicture does not prevent me from doing so.
public Task<Picture[]> GetPictures(IDictionary<string, string> tags)
{
var query = documentRepository.GetRepositoryQuery();
var employees = query.Where(doc => doc.Gender == tags["gender"]);
return Task.WhenAll(employees.Select(employee => GetPictureForEmployeeAsync(employee, tags)));
}
private Task<Picture> GetPictureForEmployeeAsync(Employee employee, IDictionary<string, string> tags)
{
var base64PictureTask = blobRepository.GetBase64PictureAsync(employee.ID.ToString());
var documentTask = documentRepository.GetItemAsync(employee.ID.ToString());
return CreatePicture(tags, base64PictureTask, documentTask);
}
private static async Task<Picture> CreatePicture(IDictionary<string, string> tags, Task<string> base64PictureTask, Task<Employee> documentTask)
{
var document = await documentTask;
return new Picture
{
EmployeeID = document.ID,
Data = await base64PictureTask,
ID = document.ID.ToString(),
Tags = tags,
};
}
If I understand it correctly, Task.WhenAll() is not affected by the two awaited tasks inside CreatePicture() because GetPictureForEmployeeAsync() is not awaited. Am I right about this? If not, how should I restructure the code to achieve what I want?
I'd like to make sure that 'await' on CreatePicture does not prevent me from doing so.
It doesn't.
If I understand it correctly, Task.WhenAll() is not affected by the two awaited tasks inside CreatePicture() because GetPictureForEmployeeAsync() is not awaited. Am I right about this?
Yes and no. The WhenAll isn't limited in any way by the awaited tasks in CreatePicture, but that has nothing to do with whether GetPictureForEmployeeAsync is awaited or not. These two lines of code are equivalent in terms of behavior:
return Task.WhenAll(employees.Select(employee => GetPictureForEmployeeAsync(employee, tags)));
return Task.WhenAll(employees.Select(async employee => await GetPictureForEmployeeAsync(employee, tags)));
I recommend reading my async intro to get a good understanding of how async and await work with tasks.
Also, since GetPictures has non-trivial logic (GetRepositoryQuery and evaluating tags["gender"]), I recommend using async and await for GetPictures, as such:
public async Task<Picture[]> GetPictures(IDictionary<string, string> tags)
{
var query = documentRepository.GetRepositoryQuery();
var employees = query.Where(doc => doc.Gender == tags["gender"]);
var tasks = employees.Select(employee => GetPictureForEmployeeAsync(employee, tags)).ToList();
return await Task.WhenAll(tasks);
}
As a final note, you may find your code cleaner if you don't pass around "tasks meant to be awaited" - instead, await them first and pass their result values:
async Task<Picture> GetPictureForEmployeeAsync(Employee employee, IDictionary<string, string> tags)
{
var base64PictureTask = blobRepository.GetBase64PictureAsync(employee.ID.ToString());
var documentTask = documentRepository.GetItemAsync(employee.ID.ToString());
await Task.WhenAll(base64PictureTask, documentTask);
return CreatePicture(tags, await base64PictureTask, await documentTask);
}
static Picture CreatePicture(IDictionary<string, string> tags, string base64Picture, Employee document)
{
return new Picture
{
EmployeeID = document.ID,
Data = base64Picture,
ID = document.ID.ToString(),
Tags = tags,
};
}
The thing to keep in mind about calling an async method is that, as soon as an await statement is reached inside that method, control immediately goes back to the code that invoked the async method -- no matter where the await statement happens to be in the method. With a 'normal' method, control doesn't go back to the code that invokes that method until the end of that method is reached.
So in your case, you can do the following:
private async Task<Picture> GetPictureForEmployeeAsync(Employee employee, IDictionary<string, string> tags)
{
// As soon as we get here, control immediately goes back to the GetPictures
// method -- no need to store the task in a variable and await it within
// CreatePicture as you were doing
var picture = await blobRepository.GetBase64PictureAsync(employee.ID.ToString());
var document = await documentRepository.GetItemAsync(employee.ID.ToString());
return CreatePicture(tags, picture, document);
}
Because the first line of code in GetPictureForEmployeeAsync has an await, control will immediately go right back to this line...
return Task.WhenAll(employees.Select(employee => GetPictureForEmployeeAsync(employee, tags)));
...as soon as it is invoked. This will have the effect of all of the employee items getting processed in parallel (well, sort of -- the number of threads that will be allotted to your application will be limited).
As an additional word of advice, if this application is hitting a database or web service to get the pictures or documents, this code will likely cause you issues with running out of available connections. If this is the case, consider using System.Threading.Tasks.Parallel and setting the maximum degree of parallelism, or use SemaphoreSlim to control the number of connections used simultaneously.

Async/Await in loop behaviour

Say I had a list of algorithms, each containing an async call somewhere in the body of that algorithm. The order in which I execute the algorithms is the order in which I want to receive the results. That is, I want the AlgorithmResult List to look like {Algorithm1Result, Algorithm2Result, Algorithm3Result} after all the algorithms have executed. Would I be right in saying that if Algorithm1 and 3 finished before 2 that my results would actually be in the order {Algorithm1Result, Algorithm3Result, Algorithm2Result}
var algorithms = new List<Algorithm>(){Algorithm1, Algorithm2, Algorithm3};
var algorithmResults = new List<AlgorithmResults>();
foreach (var algorithm in algorithms)
{
algorithmResults.Add(await algorithm.Execute());
}
NO, Result would be in the same order you added it to the list, since each operation is being waited for separately.
class Program
{
public static async Task<int> GetResult(int timing)
{
return await Task<int>.Run(() =>
{
Thread.Sleep(timing * 1000);
return timing;
});
}
public static async Task<List<int>> GetAll()
{
List<int> tasks = new List<int>();
tasks.Add(await GetResult(3));
tasks.Add(await GetResult(2));
tasks.Add(await GetResult(1));
return tasks;
}
static void Main(string[] args)
{
var res = GetAll().Result;
}
}
res anyway contains list in order it was added, also this is not parallel execution.
Since you don't add the tasks, but the results of the task after awaiting, they will be in the order you want.
Even if you did not await the result before adding the next task you could get the order you want:
in small steps, showing type awareness:
List<Task<AlgorithmResult>> tasks = new List<Task<AlgorithmResult>>();
foreach (Algorithm algorithm in algorithms)
{
Task<AlgorithmResult> task = algorithm.Execute();
// don't wait until task Completes, just remember the Task
// and continue executing the next Algorithm
tasks.Add(task);
}
Now some Tasks may be running, some may already have completed. Let's wait until they are all complete, and fetch the results:
Task.WhenAll(tasks.ToArray());
List<AlgrithmResults> results = tasks.Select(task => task.Result).ToList();

Await all Tasks in the list, and after that update the list base on task result

I want to run all tasks with WhenAll (not one by one).
But after that I need to update list (LastReport property) base on result.
I think I have solution but I would like to check if there is better way.
Idea is to:
Run all tasks
Remember relation between configuration and task
Update configuration
My solution is:
var lastReportAllTasks = new List<Task<Dictionary<string, string>>>();
var configurationTaskRelation = new Dictionary<int, Task<Dictionary<string, string>>>();
foreach (var configuration in MachineConfigurations)
{
var task = machineService.GetReports(configuration);
lastReportAllTasks.Add(task);
configurationTaskRelation.Add(configuration.Id, task);
}
await Task.WhenAll(lastReportAllTasks);
foreach (var configuration in MachineConfigurations)
{
var lastReportTask = configurationTaskRelation[configuration.Id];
configuration.LastReport = await lastReportTask;
}
The Select function can be asynchronous itself. You can await the report and return both the configuration and result in the same result object (anonymous type or tuple, whatever you prefer) :
var tasks=MachineConfigurations.Select(async conf=>{
var report= await machineService.GetReports(conf);
return new {conf,report});
var results=await Task.WhenAll(tasks);
foreach(var pair in results)
{
pair.conf.LastReport=pair.report;
}
EDIT - Loops and error handling
As Servy suggested, Task.WhenAll can be ommited and awaiting can be moved inside the loop :
foreach(var task in tasks)
{
var pair=await task;
pair.conf.LastReport=pair.report;
}
The tasks will still execute concurrently. In case of exception though, some configuration objects will be modified and some not.
In general, this would be an ugly situation, requiring extra exception handling code to clean up the modified objects. Exception handling is a lot easier when modifications are done on-the-side and finalized/applied when the happy path completes. That's one reason why updating the Configuration objects inside the Select() requires careful consideration.
In this particular case though it may be better to "skip" the failed reports, possibly move them to an error queue and reprocess them at a later time. It may be better to have partial results than no results at all, as long as this behaviour is expected:
foreach(var task in tasks)
{
try
{
var pair=await task;
pair.conf.LastReport=pair.report;
}
catch(Exception exc)
{
//Make sure the error is logged
Log.Error(exc);
ErrorQueue.Enqueue(new ProcessingError(conf,ex);
}
}
//Handle errors after the loop
EDIT 2 - Dataflow
For completeness, I do have several thousand ticket reports to generate each day, and each GDS call (the service through which every travel agency sells tickets) takes considerable time. I can't run all requests at the same time - I start getting server serialization errors if I try more than 10 concurrent requests. I can't retry everything either.
In this case I used TPL DataFlow combined with some Railway oriented programming tricks. An ActionBlock with a DOP of 8 processes the ticket requests. The results are wrapped in a Success class and sent to the next block. Failed requests and exceptions are wrapped in a Failure class and sent to another block. Both classes inherit from IFlowEnvelope which has a Successful flag. Yes, that's F# Discriminated Union envy.
This is combined with some retry logic for timeouts etc.
In pseudocode the pipeline looks like this :
var reportingBlock=new TransformBlock<Ticket,IFlowEnvelope<TicketReport>(reportFunc,dopOptions);
var happyBlock = new ActionBlock<IFlowEnvelope<TicketReport>>(storeToDb);
var errorBlock = new ActionBlock<IFlowEnvelope<TicketReport>>(logError);
reportingBlock.LinkTo(happyBlock,linkOptions,msg=>msg.Success);
reportingBlock.LinkTo(errorBlock,linkOptions,msg=>!msg.Success);
foreach(var ticket in tickets)
{
reportingBlock.Post(ticket);
}
reportFunc catches any exceptions and wraps them as Failure<T> objects:
async Task<IFlowEnvelope<Ticket,TicketReport>> reportFunc(Ticket ticket)
{
try
{
//Do the heavy processing
return new Success<TicketReport>(report);
}
catch(Exception exc)
{
//Construct an error message, msg
return new Failure<TicketReport>(report,msg);
}
}
The real pipeline includes steps that parse daily reports and individual tickets. Each call to the GDS takes 1-6 seconds so the complexity of the pipeline is justified.
I think you don't need Lists or Dictionaries. Why not simple loop which updates LastReport with results
foreach (var configuration in MachineConfigurations)
{
configuration.LastReport = await machineService.GetReports(configuration);
}
For executing all reports "in parallel"
Func<Configuration, Task> loadReport =
async config => config.LastReport = await machineService.GetReports(config);
await Task.WhenAll(MachineConfigurations.Select(loadReport));
And very poor try to be more functional.
Func<Configuration, Task<Configuration>> getConfigWithReportAsync =
async config =>
{
var report = await machineService.GetReports(config);
return new Configuration
{
Id = config.Id,
LastReport = report
};
}
var configsWithUpdatedReports =
await Task.WhenAll(MachineConfigurations.Select(getConfigWithReportAsync));
using System.Linq;
var taskResultsWithConfiguration = MachineConfigurations.Select(conf =>
new { Conf = conf, Task = machineService.GetReports(conf) }).ToList();
await Task.WhenAll(taskResultsWithConfiguration.Select(pair => pair.Task));
foreach (var pair in taskResultsWithConfiguration)
pair.Conf.LastReport = pair.Task.Result;

How to properly execute a List of Tasks async in C#

I have a list of objects that I need to run a long running process on and I would like to kick them off asynchronously, then when they are all finished return them as a list to the calling method. I've been trying different methods that I have found, however it appears that the processes are still running synchronously in the order that they are in the list. So I am sure that I am missing something in the process of how to execute a list of tasks.
Here is my code:
public async Task<List<ShipmentOverview>> GetShipmentByStatus(ShipmentFilterModel filter)
{
if (string.IsNullOrEmpty(filter.Status))
{
throw new InvalidShipmentStatusException(filter.Status);
}
var lookups = GetLookups(false, Brownells.ConsolidatedShipping.Constants.ShipmentStatusType);
var lookup = lookups.SingleOrDefault(sd => sd.Name.ToLower() == filter.Status.ToLower());
if (lookup != null)
{
filter.StatusId = lookup.Id;
var shipments = Shipments.GetShipments(filter);
var tasks = shipments.Select(async model => await GetOverview(model)).ToList();
ShipmentOverview[] finishedTask = await Task.WhenAll(tasks);
return finishedTask.ToList();
}
else
{
throw new InvalidShipmentStatusException(filter.Status);
}
}
private async Task<ShipmentOverview> GetOverview(ShipmentModel model)
{
String version;
var user = AuthContext.GetUserSecurityModel(Identity.Token, out version) as UserSecurityModel;
var profile = AuthContext.GetProfileSecurityModel(user.Profiles.First());
var overview = new ShipmentOverview
{
Id = model.Id,
CanView = true,
CanClose = profile.HasFeatureAction("Shipments", "Close", "POST"),
CanClear = profile.HasFeatureAction("Shipments", "Clear", "POST"),
CanEdit = profile.HasFeatureAction("Shipments", "Get", "PUT"),
ShipmentNumber = model.ShipmentNumber.ToString(),
ShipmentName = model.Name,
};
var parcels = Shipments.GetParcelsInShipment(model.Id);
overview.NumberParcels = parcels.Count;
var orders = parcels.Select(s => WareHouseClient.GetOrderNumberFromParcelId(s.ParcelNumber)).ToList();
overview.NumberOrders = orders.Distinct().Count();
//check validations
var vals = Shipments.GetShipmentValidations(model.Id);
if (model.ValidationTypeId == Constants.OrderValidationType)
{
if (vals.Count > 0)
{
overview.NumberOrdersTotal = vals.Count();
overview.NumberParcelsTotal = vals.Sum(s => WareHouseClient.GetParcelsPerOrder(s.ValidateReference));
}
}
return overview;
}
It looks like you're using asynchronous methods while you really want threads.
Asynchronous methods yield control back to the calling method when an async method is called, then wait until the methods has completed on the await. You can see how it works here.
Basically, the only usefulness of async/await methods is not to lock the UI, so that it stays responsive.
If you want to fire multiple processings in parallel, you will want to use threads, like such:
using System.Threading.Tasks;
public void MainMethod() {
// Parallel.ForEach will automagically run the "right" number of threads in parallel
Parallel.ForEach(shipments, shipment => ProcessShipment(shipment));
// do something when all shipments have been processed
}
public void ProcessShipment(Shipment shipment) { ... }
Marking the method as async doesn't auto-magically make it execute in parallel. Since you're not using await at all, it will in fact execute completely synchronously as if it wasn't async. You might have read somewhere that async makes functions execute asynchronously, but this simply isn't true - forget it. The only thing it does is build a state machine to handle task continuations for you when you use await and actually build all the code to manage those tasks and their error handling.
If your code is mostly I/O bound, use the asynchronous APIs with await to make sure the methods actually execute in parallel. If they are CPU bound, a Task.Run (or Parallel.ForEach) will work best.
Also, there's no point in doing .Select(async model => await GetOverview(model). It's almost equivalent to .Select(model => GetOverview(model). In any case, since the method actually doesn't return an asynchronous task, it will be executed while doing the Select, long before you get to the Task.WhenAll.
Given this, even the GetShipmentByStatus's async is pretty much useless - you only use await to await the Task.WhenAll, but since all the tasks are already completed by that point, it will simply complete synchronously.
If your tasks are CPU bound and not I/O bound, then here is the pattern I believe you're looking for:
static void Main(string[] args) {
Task firstStepTask = Task.Run(() => firstStep());
Task secondStepTask = Task.Run(() => secondStep());
//...
Task finalStepTask = Task.Factory.ContinueWhenAll(
new Task[] { step1Task, step2Task }, //more if more than two steps...
(previousTasks) => finalStep());
finalStepTask.Wait();
}

creating a .net async wrapper to a sync request

I have the following situation (or a basic misunderstanding with the async await mechanism).
Assume you have a set of 1-20 web request call that takes a long time: findItemsByProduct().
you want to wrap it around in an async request, that would be able to abstract all these calls into one async call, but I can't seem to be able to do it without using more threads.
If I'm doing:
int total = result.paginationOutput.totalPages;
for (int i = 2; i < total + 1; i++)
{
await Task.Factory.StartNew(() =>
{
result = client.findItemsByProduct(i);
});
newList.AddRange(result.searchResult.item);
}
}
return newList;
problem here, that the calls don't run together, rather they are waiting one by one.
I would like all the calls to run together and than harvest the results.
as pseudo code, I would like the code to run like this:
forEach item {
result = item.makeWebRequest();
}
foreach item {
List.addRange(item.harvestResults);
}
I have no idea how to make the code to do that though..
Ideally, you should add a findItemsByProductAsync that returns a Task<Item[]>. That way, you don't have to create unnecessary tasks using StartNew or Task.Run.
Then your code can look like this:
int total = result.paginationOutput.totalPages;
// Start all downloads; each download is represented by a task.
Task<Item[]>[] tasks = Enumerable.Range(2, total - 1)
.Select(i => client.findItemsByProductAsync(i)).ToArray();
// Wait for all downloads to complete.
Item[][] results = await Task.WhenAll(tasks);
// Flatten the results into a single collection.
return results.SelectMany(x => x).ToArray();
Given your requirements which I see as:
Process n number of non-blocking tasks
Process results after all queries have returned
I would use the CountdownEvent for this e.g.
var results = new ConcurrentBag<ItemType>(result.pagination.totalPages);
using (var e = new CountdownEvent(result.pagination.totalPages))
{
for (int i = 2; i <= result.pagination.totalPages+1; i++)
{
Task.Factory.StartNew(() => return client.findItemsByProduct(i))
.ContinueWith(items => {
results.AddRange(items);
e.Signal(); // signal task is done
});
}
// Wait for all requests to complete
e.Wait();
}
// Process results
foreach (var item in results)
{
...
}
This particular problem is solved easily enough without even using await. Simply create each of the tasks, put all of the tasks into a list, and then use WhenAll on that list to get a task that represents the completion of all of those tasks:
public static Task<Item[]> Foo()
{
int total = result.paginationOutput.totalPages;
var tasks = new List<Task<Item>>();
for (int i = 2; i < total + 1; i++)
{
tasks.Add(Task.Factory.StartNew(() => client.findItemsByProduct(i)));
}
return Task.WhenAll(tasks);
}
Also note you have a major problem in how you use result in your code. You're having each of the different tasks all using the same variable, so there are race conditions as to whether or not it works properly. You could end up adding the same call twice and having one skipped entirely. Instead you should have the call to findItemsByProduct be the result of the task, and use that task's Result.
If you want to use async-await properly you have to declare your functions async, and the functions that call you also have to be async. This continues until you have once synchronous function that starts the async process.
Your function would look like this:
by the way you didn't describe what's in the list. I assume they are
object of type T. in that case result.SearchResult.Item returns
IEnumerable
private async Task<List<T>> FindItems(...)
{
int total = result.paginationOutput.totalPages;
var newList = new List<T>();
for (int i = 2; i < total + 1; i++)
{
IEnumerable<T> result = await Task.Factory.StartNew(() =>
{
return client.findItemsByProduct(i);
});
newList.AddRange(result.searchResult.item);
}
return newList;
}
If you do it this way, your function will be asynchronous, but the findItemsByProduct will be executed one after another. If you want to execute them simultaneously you should not await for the result, but start the next task before the previous one is finished. Once all tasks are started wait until all are finished. Like this:
private async Task<List<T>> FindItems(...)
{
int total = result.paginationOutput.totalPages;
var tasks= new List<Task<IEnumerable<T>>>();
// start all tasks. don't wait for the result yet
for (int i = 2; i < total + 1; i++)
{
Task<IEnumerable<T>> task = Task.Factory.StartNew(() =>
{
return client.findItemsByProduct(i);
});
tasks.Add(task);
}
// now that all tasks are started, wait until all are finished
await Task.WhenAll(tasks);
// the result of each task is now in task.Result
// the type of result is IEnumerable<T>
// put all into one big list using some linq:
return tasks.SelectMany ( task => task.Result.SearchResult.Item)
.ToList();
// if you're not familiar to linq yet, use a foreach:
var newList = new List<T>();
foreach (var task in tasks)
{
newList.AddRange(task.Result.searchResult.item);
}
return newList;
}

Categories