Parallel performing several tasks - c#

NOTE: This is for winrt Universal app *
I'm fetching data from a service through use of an async method:
var activityList = await Task.Run(() => dataService.GetActivitiesAsync());
But for each activity I still need to load Athlete data.
So now I just loop the activities and load up the athlete data like so ( there is no bulk load option for athlete ):
foreach(var activity in activityList)
{
activity.Athlete = dataService.GetAthleteAsync(activity.AthleteID);
}
But this GetAthleteAsync will also return a Task, so I was just wondering isn't there a better way to get this done in a background thread? Somehow with Task.WhenAll or anything else?
The Athlete property on the Activity object has NotifyPropertyChanged so the UI will show the needed data when it's set.
The GetAthleteAsync method will already try to cache athletes based on the given id.
Any suggestions on how to make this perform better?
Some details, all methods are connecting to a web API.
Mostly we'll be targetting 10 unique Athletes ( but depending on the user, this could increase ).

Hms, if dataService.GetActivitiesAsync() is async and everything is called from the UI-thread, what you can do is this:
// no wrapping in Task, it is async
var activityList = await dataService.GetActivitiesAsync();
// Select a good enough tuple
var results = (from activity in activityList
select new {
Activity = activity,
AthleteTask = dataService.GetAthleteAsync(activity.AthleteID)
}).ToList(); // begin enumeration
// Wait for them to finish, ie relinquish control of the thread
await Task.WhenAll(results.Select(t => t.AthleteTask));
// Set the athletes
foreach(var pair in results)
{
pair.Activity.Athlete = pair.AthleteTask.Result;
}
(Written of the top of my head, ie no syntax checking, so some method calls could be wrong)

Related

C# Add to a List Asynchronously in API

I have an API which needs to be run in a loop for Mass processing.
Current single API is:
public async Task<ActionResult<CombinedAddressResponse>> GetCombinedAddress(AddressRequestDto request)
We are not allowed to touch/modify the original single API. However can be run in bulk, using foreach statement. What is the best way to run this asychronously without locks?
Current Solution below is just providing a list, would this be it?
public async Task<ActionResult<List<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
var combinedAddressResponses = new List<CombinedAddressResponse>();
foreach(AddressRequestDto request in requests)
{
var newCombinedAddress = (await GetCombinedAddress(request)).Value;
combinedAddressResponses.Add(newCombinedAddress);
}
return combinedAddressResponses;
}
Update:
In debugger, it has to go to combinedAddressResponse.Result.Value
combinedAddressResponse.Value = null
and Also strangely, writing combinedAddressResponse.Result.Value gives error below "Action Result does not contain a definition for for 'Value' and no accessible extension method
I'm writing this code off the top of my head without an IDE or sleep, so please comment if I'm missing something or there's a better way.
But effectively I think you want to run all your requests at once (not sequentially) doing something like this:
public async Task<ActionResult<List<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
var combinedAddressResponses = new List<CombinedAddressResponse>(requests.Count);
var tasks = new List<Task<ActionResult<CombinedAddressResponse>>(requests.Count);
foreach (var request in requests)
{
tasks.Add(Task.Run(async () => await GetCombinedAddress(request));
}
//This waits for all the tasks to complete
await tasks.WhenAll(tasks.ToArray());
combinedAddressResponses.AddRange(tasks.Select(x => x.Result.Value));
return combinedAddressResponses;
}
looking for a way to speed things up and run in parallel thanks
What you need is "asynchronous concurrency". I use the term "concurrency" to mean "doing more than one thing at a time", and "parallel" to mean "doing more than one thing at a time using threads". Since you're on ASP.NET, you don't want to use additional threads; you'd want to use a form of concurrency that works asynchronously (which uses fewer threads). So, Parallel and Task.Run should not be parts of your solution.
The way to do asynchronous concurrency is to build a collection of tasks, and then use await Task.WhenAll. E.g.:
public async Task<ActionResult<IReadOnlyList<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
// Build the collection of tasks by doing an asynchronous operation for each request.
var tasks = requests.Select(async request =>
{
var combinedAddressResponse = await GetCombinedAdress(request);
return combinedAddressResponse.Value;
}).ToList();
// Wait for all the tasks to complete and get the results.
var results = await Task.WhenAll(tasks);
return results;
}

IBackgroundJobManager sequential jobs using 'ContinueWith'

I am using IBackgroundJobManager with Hangfire integration.
Use case:
I am processing a single uploaded file. After the file is saved, I would like to start two separate Abp.BackgroundJobs sequentially. Only after the first job completes, should the second job start.
Here is my code:
var measureJob1 = await _backgroundJobManager.EnqueueAsync<FileProcessBackgroundJob, FileProcessJobArgsDto>(
new FileProcessJobArgsDto
{
Id = Id,
User = user,
})
.ContinueWith<AnalyticsBackgroundJob<AnalyticsJobArgsDto>>("measureJob",x => x);
Problem:
I can not figure out the syntax for what I need when using the .ContinueWith<???>(???).
This seems to be an XY Problem.
How to start the second background job after the first completes? (Problem X)
There is no support for guaranteeing the sequential and conditional execution of jobs.
That said, background jobs are automatically retried, which allows for Way 3 below.
So, you can effectively achieve that by one of these ways:
Combine the two jobs into one job.
Enqueue the second job inside the first job.
Enqueue both jobs (you don't need to use ContinueWith, await is much neater). In the second job, check if the first job created what it needed to. Otherwise, throw an exception and rely on retry.
What is the syntax for what I need when using ContinueWith? (Solution Y)
There is no syntax for that, other than a less desirable variant of Way 3.
Task.ContinueWith is a C# construct that runs after the EnqueueAsync task, not the actual background job.
The syntax for Way 3 with ContinueWith would be something like:
var measureJobId = await _backgroundJobManager.EnqueueAsync<FileProcessBackgroundJob, FileProcessJobArgsDto>(
new FileProcessJobArgsDto
{
Id = Id,
User = user,
}
)
.ContinueWith(task => _backgroundJobManager.EnqueueAsync<AnalyticsBackgroundJob, AnalyticsJobArgsDto>(
new AnalyticsJobArgsDto("measureJob")
)
.Unwrap();
Compare that to:
var processJobId = await _backgroundJobManager.EnqueueAsync<FileProcessBackgroundJob, FileProcessJobArgsDto>(
new FileProcessJobArgsDto
{
Id = Id,
User = user,
}
);
var measureJobId = await _backgroundJobManager.EnqueueAsync<AnalyticsBackgroundJob, AnalyticsJobArgsDto>(
new AnalyticsJobArgsDto("measureJob")
);

How does parallelization work on async/await?

I have the following code, that I intend to run asynchronously. My goal is that GetPictureForEmployeeAsync() is called in parallel as many times as needed. I'd like to make sure that 'await' on CreatePicture does not prevent me from doing so.
public Task<Picture[]> GetPictures(IDictionary<string, string> tags)
{
var query = documentRepository.GetRepositoryQuery();
var employees = query.Where(doc => doc.Gender == tags["gender"]);
return Task.WhenAll(employees.Select(employee => GetPictureForEmployeeAsync(employee, tags)));
}
private Task<Picture> GetPictureForEmployeeAsync(Employee employee, IDictionary<string, string> tags)
{
var base64PictureTask = blobRepository.GetBase64PictureAsync(employee.ID.ToString());
var documentTask = documentRepository.GetItemAsync(employee.ID.ToString());
return CreatePicture(tags, base64PictureTask, documentTask);
}
private static async Task<Picture> CreatePicture(IDictionary<string, string> tags, Task<string> base64PictureTask, Task<Employee> documentTask)
{
var document = await documentTask;
return new Picture
{
EmployeeID = document.ID,
Data = await base64PictureTask,
ID = document.ID.ToString(),
Tags = tags,
};
}
If I understand it correctly, Task.WhenAll() is not affected by the two awaited tasks inside CreatePicture() because GetPictureForEmployeeAsync() is not awaited. Am I right about this? If not, how should I restructure the code to achieve what I want?
I'd like to make sure that 'await' on CreatePicture does not prevent me from doing so.
It doesn't.
If I understand it correctly, Task.WhenAll() is not affected by the two awaited tasks inside CreatePicture() because GetPictureForEmployeeAsync() is not awaited. Am I right about this?
Yes and no. The WhenAll isn't limited in any way by the awaited tasks in CreatePicture, but that has nothing to do with whether GetPictureForEmployeeAsync is awaited or not. These two lines of code are equivalent in terms of behavior:
return Task.WhenAll(employees.Select(employee => GetPictureForEmployeeAsync(employee, tags)));
return Task.WhenAll(employees.Select(async employee => await GetPictureForEmployeeAsync(employee, tags)));
I recommend reading my async intro to get a good understanding of how async and await work with tasks.
Also, since GetPictures has non-trivial logic (GetRepositoryQuery and evaluating tags["gender"]), I recommend using async and await for GetPictures, as such:
public async Task<Picture[]> GetPictures(IDictionary<string, string> tags)
{
var query = documentRepository.GetRepositoryQuery();
var employees = query.Where(doc => doc.Gender == tags["gender"]);
var tasks = employees.Select(employee => GetPictureForEmployeeAsync(employee, tags)).ToList();
return await Task.WhenAll(tasks);
}
As a final note, you may find your code cleaner if you don't pass around "tasks meant to be awaited" - instead, await them first and pass their result values:
async Task<Picture> GetPictureForEmployeeAsync(Employee employee, IDictionary<string, string> tags)
{
var base64PictureTask = blobRepository.GetBase64PictureAsync(employee.ID.ToString());
var documentTask = documentRepository.GetItemAsync(employee.ID.ToString());
await Task.WhenAll(base64PictureTask, documentTask);
return CreatePicture(tags, await base64PictureTask, await documentTask);
}
static Picture CreatePicture(IDictionary<string, string> tags, string base64Picture, Employee document)
{
return new Picture
{
EmployeeID = document.ID,
Data = base64Picture,
ID = document.ID.ToString(),
Tags = tags,
};
}
The thing to keep in mind about calling an async method is that, as soon as an await statement is reached inside that method, control immediately goes back to the code that invoked the async method -- no matter where the await statement happens to be in the method. With a 'normal' method, control doesn't go back to the code that invokes that method until the end of that method is reached.
So in your case, you can do the following:
private async Task<Picture> GetPictureForEmployeeAsync(Employee employee, IDictionary<string, string> tags)
{
// As soon as we get here, control immediately goes back to the GetPictures
// method -- no need to store the task in a variable and await it within
// CreatePicture as you were doing
var picture = await blobRepository.GetBase64PictureAsync(employee.ID.ToString());
var document = await documentRepository.GetItemAsync(employee.ID.ToString());
return CreatePicture(tags, picture, document);
}
Because the first line of code in GetPictureForEmployeeAsync has an await, control will immediately go right back to this line...
return Task.WhenAll(employees.Select(employee => GetPictureForEmployeeAsync(employee, tags)));
...as soon as it is invoked. This will have the effect of all of the employee items getting processed in parallel (well, sort of -- the number of threads that will be allotted to your application will be limited).
As an additional word of advice, if this application is hitting a database or web service to get the pictures or documents, this code will likely cause you issues with running out of available connections. If this is the case, consider using System.Threading.Tasks.Parallel and setting the maximum degree of parallelism, or use SemaphoreSlim to control the number of connections used simultaneously.

How to use Threads for Processing Many Tasks

I have a C# requirement for individually processing a 'great many' (perhaps > 100,000) records. Running this process sequentially is proving to be very slow with each record taking a good second or so to complete (with a timeout error set at 5 seconds).
I would like to try running these tasks asynchronously by using a set number of worker 'threads' (I use the term 'thread' here cautiously as I am not sure if I should be looking at a thread, or a task or something else).
I have looked at the ThreadPool, but I can't imagine it could queue the volume of requests required. My ideal pseudo code would look something like this...
public void ProcessRecords() {
SetMaxNumberOfThreads(20);
MyRecord rec;
while ((rec = GetNextRecord()) != null) {
var task = WaitForNextAvailableThreadFromPool(ProcessRecord(rec));
task.Start()
}
}
I will also need a mechanism that the processing method can report back to the parent/calling class.
Can anyone point me in the right direction with perhaps some example code?
A possible simple solution would be to use a TPL Dataflow block which is a higher abstraction over the TPL with configurations for degree of parallelism and so forth. You simply create the block (ActionBlock in this case), Post everything to it, wait asynchronously for completion and TPL Dataflow handles all the rest for you:
var block = new ActionBlock<MyRecord>(
rec => ProcessRecord(rec),
new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = 20});
MyRecord rec;
while ((rec = GetNextRecord()) != null)
{
block.Post(rec);
}
block.Complete();
await block.Completion
Another benefit is that the block starts working as soon as the first record arrives and not only when all the records have been received.
If you need to report back on each record you can use a TransformBlock to do the actual processing and link an ActionBlock to it that does the updates:
var transform = new TransfromBlock<MyRecord, Report>(rec =>
{
ProcessRecord(rec);
return GenerateReport(rec);
}, new ExecutionDataflowBlockOptions{MaxDegreeOfParallelism = 20});
var reporter = new ActionBlock<Report>(report =>
{
RaiseEvent(report) // Or any other mechanism...
});
transform.LinkTo(reporter, new DataflowLinkOptions { PropagateCompletion = true });
MyRecord rec;
while ((rec = GetNextRecord()) != null)
{
transform.Post(rec);
}
transform.Complete();
await transform.Completion
Have you thought about using parallel processing with Actions?
ie, create a method to process a single record, add each record method as an action into a list, and then perform a parrallel.for on the list.
Dim list As New List(Of Action)
list.Add(New Action(Sub() MyMethod(myParameter)))
Parallel.ForEach(list, Sub(t) t.Invoke())
This is in vb.net, but I think you get the gist.

Task caching when performing Tasks in parallel with WhenAll

So I have this small code block that will perform several Tasks in parallel.
// no wrapping in Task, it is async
var activityList = await dataService.GetActivitiesAsync();
// Select a good enough tuple
var results = (from activity in activityList
select new {
Activity = activity,
AthleteTask = dataService.GetAthleteAsync(activity.AthleteID)
}).ToList(); // begin enumeration
// Wait for them to finish, ie relinquish control of the thread
await Task.WhenAll(results.Select(t => t.AthleteTask));
// Set the athletes
foreach(var pair in results)
{
pair.Activity.Athlete = pair.AthleteTask.Result;
}
So I'm downloading Athlete data for each given Activity. But it could be that we are requesting the same athlete several times.
How can we ensure that the GetAthleteAsync method will only go online to fetch the actual data if it's not yet in our memory cache?
Currently I tried using a ConcurrentDictionary<int, Athelete> inside the GetAthleteAsync method
private async Task<Athlete> GetAthleteAsync(int athleteID)
{
if(cacheAthletes.Contains(athleteID))
return cacheAthletes[atheleID];
** else fetch from web
}
You can change your ConcurrentDictionary to cache the Task<Athlete> instead of just the Athlete. Remember, a Task<T> is a promise - an operation that will eventually result in a T. So, you can cache operations instead of results.
ConcurrentDictionary<int, Task<Athlete>> cacheAthletes;
Then, your logic will go like this: if the operation is already in the cache, return the cached task immediately (synchronously). If it's not, then start the download, add the download operation to the cache, and return the new download operation. Note that all the "download operation" logic is moved to another method:
private Task<Athlete> GetAthleteAsync(int athleteID)
{
return cacheAthletes.GetOrAdd(athleteID, id => LoadAthleteAsync(id));
}
private async Task<Athlete> LoadAthleteAsync(int athleteID)
{
// Load from web
}
This way, multiple parallel requests for the same athlete will get the same Task<Athlete>, and each athlete is only downloaded once.
You also need to skip tasks, which unsuccessfuly completed.
That's my snippet:
ObjectCache _cache = MemoryCache.Default;
static object _lockObject = new object();
public Task<T> GetAsync<T>(string cacheKey, Func<Task<T>> func, TimeSpan? cacheExpiration = null) where T : class
{
var task = (T)_cache[cacheKey];
if (task != null) return task;
lock (_lockObject)
{
task = (T)_cache[cacheKey];
if (task != null) return task;
task = func();
Set(cacheKey, task, cacheExpiration);
task.ContinueWith(t => {
if (t.Status != TaskStatus.RanToCompletion)
_cache.Remove(cacheKey);
});
}
return task;
}i
When caching values provided by Task-objects, you'd like to make sure the cache implementation ensures that:
No parallel or unnecessary operations to get a value will be started. In your case, this is your question about avoiding multiple GetAthleteAsync for the same id.
You don't want to have negative caching (i.e. caching failed results), or if you do want it, it needs to be a implementation decision and you need to handle eventually replacing failed results somehow.
Cache users can't get invalidated results from the cache, even if the value is invalidated during an await.
I have a blog post about caching Task-objects with example code, that ensures all points above and could be useful in your situation. Basically my solution is to store Lazy<Task<T>> objects in a MemoryCache.

Categories