Find Result of Parallel Async Tasks - c#

Based off this question I'm trying to set up code to save several images to Azure Blob Storage in parallel. This method below works fine and awaiting Task.WhenAll(tasks) awaits for all to complete before continuing.
The only trouble is, I would like to be able to find out if each request to store the information in our database actually succeeded. _db.AddImageAsync returns a bool and the code below waits for all tasks to complete but when I check the result of all the tasks each is false (even if I actually returned true inside the brackets).
Each task in the Enumerable says the result has not yet been computed even though I stepped through with breakpoints and each has been carried out.
var tasks = wantedSizes.Select(async (wantedSize, index) =>
{
var resize = size.CalculateResize(wantedSize.GetMaxSize());
var quality = wantedSize.GetQuality();
using (var output = ImageProcessHelper.Process(streams[index], resize, quality))
{
var path = await AzureBlobHelper.SaveFileAsync(output, FileType.Image);
var result = await _db.AddImageAsync(id, wantedSize, imageNumber, path);
return result;
}
});
await Task.WhenAll(tasks)
if (!tasks.All(task => task.Result))
return new ApiResponse(ResponseStatus.Fail);
Any help is much appreciated!

Because .Select( is lazy evaluated and returns a IEnumerable<Task<bool>> you are causing the .Select( to be run multiple times when you iterate over the result multiple times. Throw a .ToList() on it to make it a List<Task<bool>> and that will only execute the .Select( once and the multiple enumerations will be over the returned List<Task<bool>> which will not have side effects.
var tasks = wantedSizes.Select(async (wantedSize, index) =>
{
var resize = size.CalculateResize(wantedSize.GetMaxSize());
var quality = wantedSize.GetQuality();
using (var output = ImageProcessHelper.Process(streams[index], resize, quality))
{
var path = await AzureBlobHelper.SaveFileAsync(output, FileType.Image);
//Double check your documentation, is _db.AddImageAsync thread safe?
var result = await _db.AddImageAsync(id, wantedSize, imageNumber, path);
return result;
}
}).ToList(); //We run the Select once here to process the .ToList().
await Task.WhenAll(tasks) //This is the first enumeration of the variable "tasks".
if (!tasks.All(task => task.Result)) //This is a 2nd enumeration of the variable.
return new ApiResponse(ResponseStatus.Fail);

Related

Parallelize C# Graph API SDK methods

I'm connecting to and fetching transitive groups data from MS Graph API via. following logic:
var queryOptions = new List<QueryOption>()
{
new QueryOption("$count", "true")
};
var lstTemp = graphClient.Groups[$"{groupID}"].TransitiveMembers
.Request(queryOptions)
.Header("ConsistencyLevel", "eventual")
.Select("id,mail,onPremisesSecurityIdentifier").Top(999)
.GetAsync().GetAwaiter().GetResult();
var lstGroups = lstTemp.CurrentPage.Where(x => x.ODataType.Contains("group")).ToList();
while (lstTemp.NextPageRequest != null)
{
lstTemp = lstTemp.NextPageRequest.GetAsync().GetAwaiter().GetResult();
lstGroups.AddRange(lstTemp.CurrentPage.Where(x => x.ODataType.Contains("group")).ToList());
}
Although the following logic works fine, for larger data set where the result count could be around 10K records or more, I've noticed the time required to fetch all of the results is around 10-12 seconds.
I'm looking for a solution by which we can parallelize (or multi-threading/tasking) API calls are executed in such a way that the overall time to get completed results is further reduced.
In C# we have Parallel.For etc. can I use it in this scenario to replace my regular While loop mentioned above?
Any suggestions?
Not really using the Parallel.For api, but you can execute a bunch of asynchronous tasks concurrently by throwing them into a List<Task<T>> and awaiting the whole list with Task.WhenAll. Your code may look something like this:
var queryOptions = new List<QueryOption>()
{
new QueryOption("$count", "true")
};
// Creating the first request
var firstRequest = graphClient.Groups[$"{groupID}"].TransitiveMembers
.Request(queryOptions)
.Header("ConsistencyLevel", "eventual")
.Select("id,mail,onPremisesSecurityIdentifier").Top(999)
.GetAsync();
// Creating a list of all requests (starting with the first one)
var requests = new List<Task<IGroupTransitiveMembersCollectionWithReferencesPage>>() { firstRequest };
// Awaiting the first response
var firstResponse = await firstRequest;
// Getting the total count from the request
var count = (int) firstResponse.AdditionalData["#odata.count"];
// Setting offset to the amount of data you already pulled
var offset = 999;
while (offset < count)
{
// Creating the next request
var nextRequest = graphClient.Groups[$"{groupID}"].TransitiveMembers
.Request() // Notice no $count=true (may potentially hurt performance and we don't need it anymore anyways)
.Header("ConsistencyLevel", "eventual")
.Select("id,mail,onPremisesSecurityIdentifier")
.Skip(offset).Top(999) // Skipping the data you already pulled
.GetAsync();
// Adding it to the list
requests.Add(nextRequest);
// Increasing the offset
offset += 999;
}
// Waiting for all the requests to finish
var allResponses = await Task.WhenAll(requests);
// This flattens the list while filtering as you did
allResponses
.Select(x => x.CurrentPage)
.SelectMany(x => x.Where(x => x.ODataType.Contains("group")));
Couldn't check if this code works without a Graph tenant, so you might need to modify a bit, but I hope you can see the general idea.
Also I allowed myself to refactor the code to use proper async/await since it's good and standard practice to do that, but it should work with .GetAwaiter().GetResult() if you can't use await in your context for some reason (please consider, though).

Task.WhenAll on List<Task> behaving differently than Task.WhenAll on IEnumerable<Task>

I'm seeing some odd behavioral differences when calling Task.WhenAll(IEnumerable<Task<T>>) and calling Task.WhenAll(List<Task<T>>) while trying to catch exceptions
My code is as follows:
public async Task Run()
{
var en = GetResources(new []{"a","b","c","d"});
await foreach (var item in en)
{
var res = item.Select(x => x.Id).ToArray();
System.Console.WriteLine(string.Join("-> ", res));
}
}
private async IAsyncEnumerable<IEnumerable<ResponseObj>> GetResources(
IEnumerable<string> identifiers)
{
IEnumerable<IEnumerable<string>> groupedIds = identifiers.Batch(2);
// MoreLinq extension method -- batches IEnumerable<T>
// into IEnumerable<IEnumerable<T>>
foreach (var batch in groupedIds)
{
//GetHttpResource is simply a wrapper around HttpClient which
//makes an Http request to an API endpoint with the given parameter
var tasks = batch.Select(id => ac.GetHttpResourceAsync(id)).ToList();
// if I remove this ToList(), the behavior changes
var stats = tasks.Select(t => t.Status);
// at this point the status being WaitingForActivation is reasonable
// since I have not awaited yet
IEnumerable<ResponseObj> res = null;
var taskGroup = Task.WhenAll(tasks);
try
{
res = await taskGroup;
var awaitedStats = tasks.Select(t => t.Status);
//this is the part that changes
//if I have .ToList(), the statuses are RanToCompletion or Faulted
//if I don't have .ToList(), the statuses are always WaitingForActivation
}
catch (Exception ex)
{
var exceptions = taskGroup.Exception.InnerException;
DoSomethingWithExceptions(exceptions);
res = tasks.Where(g => !g.IsFaulted).Select(t => t.Result);
//throws an exception because all tasks are WaitingForActivation
}
yield return res;
}
}
Ultimately, I have an IEnumerable of identifiers, I'm batching that into batches of 2 (hard coded in this example), and then running Task.WhenAll to run each batch of 2 at the same time.
What I want is if 1 of the 2 GetResource tasks fails, to still return the successful result of the other, and handle the exception (say, write it to a log).
If I run Task.WhenAll on a list of tasks, this works exactly how I want. However, if I remove the .ToList(), when I attempt to find my faulted tasks in the catch block after the await taskGroup, I run into problems because the statuses of my tasks are still WaitingForActivation although I believe they have been awaited.
When there is no exception thrown, the List and IEnumerable act the same way. This only starts causing issues when I try to catch exceptions.
What is the reasoning behind this behavior? The Task.WhenAll must have completed since I get into the catch block, however why are the statuses still WaitingForActivation? Have I failed to grasp something fundamental here?
Unless you make the list concrete (by using ToList()), each time you enumerate over the list you are calling GetHttpResourceAsync again, and creating a new task. This is due to the deferred execution.
I would definitely keep the ToList() call when working with a list of tasks

How does parallelization work on async/await?

I have the following code, that I intend to run asynchronously. My goal is that GetPictureForEmployeeAsync() is called in parallel as many times as needed. I'd like to make sure that 'await' on CreatePicture does not prevent me from doing so.
public Task<Picture[]> GetPictures(IDictionary<string, string> tags)
{
var query = documentRepository.GetRepositoryQuery();
var employees = query.Where(doc => doc.Gender == tags["gender"]);
return Task.WhenAll(employees.Select(employee => GetPictureForEmployeeAsync(employee, tags)));
}
private Task<Picture> GetPictureForEmployeeAsync(Employee employee, IDictionary<string, string> tags)
{
var base64PictureTask = blobRepository.GetBase64PictureAsync(employee.ID.ToString());
var documentTask = documentRepository.GetItemAsync(employee.ID.ToString());
return CreatePicture(tags, base64PictureTask, documentTask);
}
private static async Task<Picture> CreatePicture(IDictionary<string, string> tags, Task<string> base64PictureTask, Task<Employee> documentTask)
{
var document = await documentTask;
return new Picture
{
EmployeeID = document.ID,
Data = await base64PictureTask,
ID = document.ID.ToString(),
Tags = tags,
};
}
If I understand it correctly, Task.WhenAll() is not affected by the two awaited tasks inside CreatePicture() because GetPictureForEmployeeAsync() is not awaited. Am I right about this? If not, how should I restructure the code to achieve what I want?
I'd like to make sure that 'await' on CreatePicture does not prevent me from doing so.
It doesn't.
If I understand it correctly, Task.WhenAll() is not affected by the two awaited tasks inside CreatePicture() because GetPictureForEmployeeAsync() is not awaited. Am I right about this?
Yes and no. The WhenAll isn't limited in any way by the awaited tasks in CreatePicture, but that has nothing to do with whether GetPictureForEmployeeAsync is awaited or not. These two lines of code are equivalent in terms of behavior:
return Task.WhenAll(employees.Select(employee => GetPictureForEmployeeAsync(employee, tags)));
return Task.WhenAll(employees.Select(async employee => await GetPictureForEmployeeAsync(employee, tags)));
I recommend reading my async intro to get a good understanding of how async and await work with tasks.
Also, since GetPictures has non-trivial logic (GetRepositoryQuery and evaluating tags["gender"]), I recommend using async and await for GetPictures, as such:
public async Task<Picture[]> GetPictures(IDictionary<string, string> tags)
{
var query = documentRepository.GetRepositoryQuery();
var employees = query.Where(doc => doc.Gender == tags["gender"]);
var tasks = employees.Select(employee => GetPictureForEmployeeAsync(employee, tags)).ToList();
return await Task.WhenAll(tasks);
}
As a final note, you may find your code cleaner if you don't pass around "tasks meant to be awaited" - instead, await them first and pass their result values:
async Task<Picture> GetPictureForEmployeeAsync(Employee employee, IDictionary<string, string> tags)
{
var base64PictureTask = blobRepository.GetBase64PictureAsync(employee.ID.ToString());
var documentTask = documentRepository.GetItemAsync(employee.ID.ToString());
await Task.WhenAll(base64PictureTask, documentTask);
return CreatePicture(tags, await base64PictureTask, await documentTask);
}
static Picture CreatePicture(IDictionary<string, string> tags, string base64Picture, Employee document)
{
return new Picture
{
EmployeeID = document.ID,
Data = base64Picture,
ID = document.ID.ToString(),
Tags = tags,
};
}
The thing to keep in mind about calling an async method is that, as soon as an await statement is reached inside that method, control immediately goes back to the code that invoked the async method -- no matter where the await statement happens to be in the method. With a 'normal' method, control doesn't go back to the code that invokes that method until the end of that method is reached.
So in your case, you can do the following:
private async Task<Picture> GetPictureForEmployeeAsync(Employee employee, IDictionary<string, string> tags)
{
// As soon as we get here, control immediately goes back to the GetPictures
// method -- no need to store the task in a variable and await it within
// CreatePicture as you were doing
var picture = await blobRepository.GetBase64PictureAsync(employee.ID.ToString());
var document = await documentRepository.GetItemAsync(employee.ID.ToString());
return CreatePicture(tags, picture, document);
}
Because the first line of code in GetPictureForEmployeeAsync has an await, control will immediately go right back to this line...
return Task.WhenAll(employees.Select(employee => GetPictureForEmployeeAsync(employee, tags)));
...as soon as it is invoked. This will have the effect of all of the employee items getting processed in parallel (well, sort of -- the number of threads that will be allotted to your application will be limited).
As an additional word of advice, if this application is hitting a database or web service to get the pictures or documents, this code will likely cause you issues with running out of available connections. If this is the case, consider using System.Threading.Tasks.Parallel and setting the maximum degree of parallelism, or use SemaphoreSlim to control the number of connections used simultaneously.

Better way to collect results from async tasks

My MVC app ocasionally results in a deadlock. I think it is likely due to a faulty way I am collecting data from completed async tasks.
I have two independent async methods.
var task1 = GetNamesFromSource1Async(); // a database call, may throw an exception
var task2 = GetNamesFromSource2Async(); // a database call, may throw an exception
var total = new List<string>();
await Task.WhenAll(task1, taks2).ConfigureAwait(false);
Question part 1: What is the safest recommended way and the best practice to collect results from these tasks:
// here I already know that both tasks are completed and
// I am using (or abusing?) await to get task results
List<string> names1 = await task1.ConfigureAwait(false);
List<string> names2 = await task2.ConfigureAwait(false);
if (names1 != null) total.AddRange(names1);
if (names2 != null) total.AddRange(names2);
or
total.AddRange(task1.IsFaulted ? new List<string> : task1.Result);
total.AddRange(task2.IsFaulted ? new List<string> : task2.Result);
?
Question part 2: in addition if I want to transform data from the first source, is it safe to use ContinueWith (when I say safe I mean from the standpoint of deadlocks)
var task1 = GetNamesFromSource1Async().ContinueWith(t =>
{
if ( !t.IsFaulted && t.Result != null)
{
return t.Result.Take(1).ToList();
}
});
Remark: here I am trying control for exceptions within each of the tasks by checking IsFaulted flag.
A recommendation on the best practice to solve this problem would be highly appreciated. I am using .NET 4.5
.Result is a blocking call and can lead to deadlock when mixed with async/await
var task1 = GetNamesFromSource1Async(); // a database call, may throw an exception
var task2 = GetNamesFromSource2Async(); // a database call, may throw an exception
var total = new List<string>();
var results = await Task.WhenAll(task1, task2);
total.AddRange(results.Where(s => s != null && s.Count > 0).SelectMany(s => s));
Update
The above assumed the return types were all the same.
However from your comment...
How would you modify the last line if I still need to collect results
but task1 and task2 are based on different types?
and referencing this answer
Awaiting multiple Tasks with different results
Then it would be modified as
var task1 = GetNamesFromSource1Async(); // a database call, may throw an exception
var task2 = GetNamesFromSource2Async(); // a database call, may throw an exception
var total = new List<string>();
await Task.WhenAll(task1, task2);
List<String> names1 = await task1;
List<int> names2 = await task2;
//...process results

async and await while adding elements to List<T>

I wrote method, which adds elements to the List from many sources. See below:
public static async Task<List<SearchingItem>> GetItemsToSelect()
{
List<SearchingItem> searchingItems = new List<SearchingItem>();
foreach (Place place in await GetPlaces())
{
searchingItems.Add(new SearchingItem() {
IdFromRealModel=place.Id, NameToDisplay=place.FullName,
ExtraInformation=place.Name, TypeOfSearchingItem=TypeOfSearchingItem.PLACE });
}
foreach (Group group in await GetGroups())
{
searchingItems.Add(new SearchingItem()
{
IdFromRealModel = group.Id, NameToDisplay = group.Name,
ExtraInformation = group.TypeName, TypeOfSearchingItem = TypeOfSearchingItem.GROUP
});
}
return searchingItems;
}
I tested this method and works propertly. I suppose that it works propertly, because GetPlaces method return 160 elements and GetGroups return 3000. But, I was wondering if it will work if the methods return elements in the same time. Should I lock list searchingItems ?
Thank you for advice.
Your items do not run at the same time, you start GetPlaces(), stop and wait for GetPlaces() result, then go in to the first loop. You then start GetGroups(), stop and wait for GetGroups() result, then go in to the second loop. Your loops are not concurrent so you have no need to lock while adding them.
However if you have noticed your two async methods are also not concurrent, you can easily modify your program to make it so though.
public static async Task<List<SearchingItem>> GetItemsToSelect()
{
List<SearchingItem> searchingItems = new List<SearchingItem>();
var getPlacesTask = GetPlaces();
var getGroupsTask = GetGroups();
foreach (Place place in await getPlacesTask)
{
searchingItems.Add(new SearchingItem() {
IdFromRealModel=place.Id, NameToDisplay=place.FullName,
ExtraInformation=place.Name, TypeOfSearchingItem=TypeOfSearchingItem.PLACE });
}
foreach (Group group in await getGroupsTask)
{
searchingItems.Add(new SearchingItem()
{
IdFromRealModel = group.Id, NameToDisplay = group.Name,
ExtraInformation = group.TypeName, TypeOfSearchingItem = TypeOfSearchingItem.GROUP
});
}
return searchingItems;
}
What this will do will start GetPlaces(), start GetGroups(), stop and wait for GetPlaces() result, then go in to the first loop, stop and wait for GetGroups() result, then go in to the second loop.
The two loops are still not concurrent, but your two await-able methods are which may give you a small performance boost. I doubt you would get any benifit from making the loops concurrent, they appear to just be building models and the overhead of making it thread safe would not be worth it for how little work is being done.
If you really wanted to try and make it more parallel (but I doubt you will see much benefit) is use PLINQ to build your models.
public static async Task<List<SearchingItem>> GetItemsToSelect()
{
var getPlacesTask = GetPlaces();
var getGroupsTask = GetGroups();
var places = await getPlacesTask;
//Just make the initial list from the LINQ object.
List<SearchingItem> searchingItems = places.AsParallel().Select(place=>
new SearchingItem() {
IdFromRealModel=place.Id, NameToDisplay=place.FullName,
ExtraInformation=place.Name, TypeOfSearchingItem=TypeOfSearchingItem.PLACE
}).ToList();
var groups = await getGroupsTask;
//build up a PLINQ IEnumerable
var groupSearchItems = groups.AsParallel().Select(group=>
new SearchingItem()
{
IdFromRealModel = group.Id, NameToDisplay = group.Name,
ExtraInformation = group.TypeName, TypeOfSearchingItem = TypeOfSearchingItem.GROUP
});
//The building of the IEnumerable was parallel but the adding is serial.
searchingItems.AddRange(groupSearchItems);
return searchingItems;
}

Categories