Task caching when performing Tasks in parallel with WhenAll - c#

So I have this small code block that will perform several Tasks in parallel.
// no wrapping in Task, it is async
var activityList = await dataService.GetActivitiesAsync();
// Select a good enough tuple
var results = (from activity in activityList
select new {
Activity = activity,
AthleteTask = dataService.GetAthleteAsync(activity.AthleteID)
}).ToList(); // begin enumeration
// Wait for them to finish, ie relinquish control of the thread
await Task.WhenAll(results.Select(t => t.AthleteTask));
// Set the athletes
foreach(var pair in results)
{
pair.Activity.Athlete = pair.AthleteTask.Result;
}
So I'm downloading Athlete data for each given Activity. But it could be that we are requesting the same athlete several times.
How can we ensure that the GetAthleteAsync method will only go online to fetch the actual data if it's not yet in our memory cache?
Currently I tried using a ConcurrentDictionary<int, Athelete> inside the GetAthleteAsync method
private async Task<Athlete> GetAthleteAsync(int athleteID)
{
if(cacheAthletes.Contains(athleteID))
return cacheAthletes[atheleID];
** else fetch from web
}

You can change your ConcurrentDictionary to cache the Task<Athlete> instead of just the Athlete. Remember, a Task<T> is a promise - an operation that will eventually result in a T. So, you can cache operations instead of results.
ConcurrentDictionary<int, Task<Athlete>> cacheAthletes;
Then, your logic will go like this: if the operation is already in the cache, return the cached task immediately (synchronously). If it's not, then start the download, add the download operation to the cache, and return the new download operation. Note that all the "download operation" logic is moved to another method:
private Task<Athlete> GetAthleteAsync(int athleteID)
{
return cacheAthletes.GetOrAdd(athleteID, id => LoadAthleteAsync(id));
}
private async Task<Athlete> LoadAthleteAsync(int athleteID)
{
// Load from web
}
This way, multiple parallel requests for the same athlete will get the same Task<Athlete>, and each athlete is only downloaded once.

You also need to skip tasks, which unsuccessfuly completed.
That's my snippet:
ObjectCache _cache = MemoryCache.Default;
static object _lockObject = new object();
public Task<T> GetAsync<T>(string cacheKey, Func<Task<T>> func, TimeSpan? cacheExpiration = null) where T : class
{
var task = (T)_cache[cacheKey];
if (task != null) return task;
lock (_lockObject)
{
task = (T)_cache[cacheKey];
if (task != null) return task;
task = func();
Set(cacheKey, task, cacheExpiration);
task.ContinueWith(t => {
if (t.Status != TaskStatus.RanToCompletion)
_cache.Remove(cacheKey);
});
}
return task;
}i

When caching values provided by Task-objects, you'd like to make sure the cache implementation ensures that:
No parallel or unnecessary operations to get a value will be started. In your case, this is your question about avoiding multiple GetAthleteAsync for the same id.
You don't want to have negative caching (i.e. caching failed results), or if you do want it, it needs to be a implementation decision and you need to handle eventually replacing failed results somehow.
Cache users can't get invalidated results from the cache, even if the value is invalidated during an await.
I have a blog post about caching Task-objects with example code, that ensures all points above and could be useful in your situation. Basically my solution is to store Lazy<Task<T>> objects in a MemoryCache.

Related

Stop Reentrancy on MemoryCache Calls

The app needs to load data and cache it for a period of time. I would expect that if multiple parts of the app want to access the same cache key at the same time, the cache should be smart enough to only load the data once and return the result of that call to all callers. However, MemoryCache is not doing this. If you hit the cache in parallel (which often happens in the app) it creates a task for each attempt to get the cache value. I thought that this code would achieve the desired result, but it doesn't. I would expect the cache to only run one GetDataAsync task, wait for it to complete, and use the result to get the values for other calls.
using Microsoft.Extensions.Caching.Memory;
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
namespace ConsoleApp4
{
class Program
{
private const string Key = "1";
private static int number = 0;
static async Task Main(string[] args)
{
var memoryCache = new MemoryCache(new MemoryCacheOptions { });
var tasks = new List<Task>();
tasks.Add(memoryCache.GetOrCreateAsync(Key, (cacheEntry) => GetDataAsync()));
tasks.Add(memoryCache.GetOrCreateAsync(Key, (cacheEntry) => GetDataAsync()));
tasks.Add(memoryCache.GetOrCreateAsync(Key, (cacheEntry) => GetDataAsync()));
await Task.WhenAll(tasks);
Console.WriteLine($"The cached value was: {memoryCache.Get(Key)}");
}
public static async Task<int> GetDataAsync()
{
//Simulate getting a large chunk of data from the database
await Task.Delay(3000);
number++;
Console.WriteLine(number);
return number;
}
}
}
That's not what happens. The above displays these results (not necessarily in this order):
2
1
3
The cached value was: 3
It creates a task for each cache request and discards the values returned from the other two.
This needlessly spends time and it makes me wonder if you can say this class is even thread-safe. ConcurrentDictionary has the same behaviour. I tested it and the same thing happens.
Is there a way to achieve the desired behaviour where the task doesn't run 3 times?
MemoryCache leaves it to you to decide how to handle races to populate a cache key. In your case you don't want multiple threads to compete to populate a key presumably because it's expensive to do that.
To coordinate the work of multiple threads like that you need a lock, but using a C# lock statement in asynchronous code can lead to thread pool starvation. Fortunately, SemaphoreSlim provides a way to do async locking so it becomes a matter of creating a guarded memory cache that wraps an underlying IMemoryCache.
My first solution only had a single semaphore for the entire cache putting all cache population tasks in a single line which isn't very smart so instead here is more elaborate solution with a semaphore for each cache key. Another solution could be to have a fixed number of semaphores picked by a hash of the key.
sealed class GuardedMemoryCache : IDisposable
{
readonly IMemoryCache cache;
readonly ConcurrentDictionary<object, SemaphoreSlim> semaphores = new();
public GuardedMemoryCache(IMemoryCache cache) => this.cache = cache;
public async Task<TItem> GetOrCreateAsync<TItem>(object key, Func<ICacheEntry, Task<TItem>> factory)
{
var semaphore = GetSemaphore(key);
await semaphore.WaitAsync();
try
{
return await cache.GetOrCreateAsync(key, factory);
}
finally
{
semaphore.Release();
RemoveSemaphore(key);
}
}
public object Get(object key) => cache.Get(key);
public void Dispose()
{
foreach (var semaphore in semaphores.Values)
semaphore.Release();
}
SemaphoreSlim GetSemaphore(object key) => semaphores.GetOrAdd(key, _ => new SemaphoreSlim(1));
void RemoveSemaphore(object key)
{
if (semaphores.TryRemove(key, out var semaphore))
semaphore.Dispose();
}
}
If multiple threads try to populate the same cache key only a single thread will actually do it. The other threads will instead return the value that was created.
Assuming that you use dependency injection, you can let GuardedMemoryCache implement IMemoryCache by adding a few more methods that forward to the underlying cache to modify the caching behavior throughout your application with very few code changes.
There are different solutions available, the most famous of which is probably LazyCache: it's a great library.
Another one that you may find useful is FusionCache ⚡🦥, which I recently released: it has the exact same feature (although implemented differently) and much more.
The feature you are looking for is described here and you can use it like this:
var result = await fusionCache.GetOrSetAsync(
Key,
_ => await GetDataAsync(),
TimeSpan.FromMinutes(2)
);
You may also find some of the other features interesting, like fail-safe, advanced timeouts with background factory completion and support for an optional, distributed 2nd level.
If you will give it a chance please let me know what you think.
/shameless-plug

How to limit number of async IO tasks to database?

I have a list of id's and I want to get data for each of those id in parallel from database. My below ExecuteAsync method is called at very high throughput and for each request we have around 500 ids for which I need to extract data.
So I have got below code where I am looping around list of ids and making async calls for each of those id in parallel and it works fine.
private async Task<List<T>> ExecuteAsync<T>(IList<int> ids, IPollyPolicy policy,
Func<CancellationToken, int, Task<T>> mapper) where T : class
{
var tasks = new List<Task<T>>(ids.Count);
// invoking multiple id in parallel to get data for each id from database
for (int i = 0; i < ids.Count; i++)
{
tasks.Add(Execute(policy, ct => mapper(ct, ids[i])));
}
// wait for all id response to come back
var responses = await Task.WhenAll(tasks);
var excludeNull = new List<T>(ids.Count);
for (int i = 0; i < responses.Length; i++)
{
var response = responses[i];
if (response != null)
{
excludeNull.Add(response);
}
}
return excludeNull;
}
private async Task<T> Execute<T>(IPollyPolicy policy,
Func<CancellationToken, Task<T>> requestExecuter) where T : class
{
var response = await policy.Policy.ExecuteAndCaptureAsync(
ct => requestExecuter(ct), CancellationToken.None);
if (response.Outcome == OutcomeType.Failure)
{
if (response.FinalException != null)
{
// log error
throw response.FinalException;
}
}
return response?.Result;
}
Question:
Now as you can see I am looping all ids and making bunch of async calls to database in parallel for each id which can put lot of load on database (depending on how many request is coming). So I want to limit the number of async calls we are making to database. I modified ExecuteAsync to use Semaphore as shown below but it doesn't look like it does what I want it to do:
private async Task<List<T>> ExecuteAsync<T>(IList<int> ids, IPollyPolicy policy,
Func<CancellationToken, int, Task<T>> mapper) where T : class
{
var throttler = new SemaphoreSlim(250);
var tasks = new List<Task<T>>(ids.Count);
// invoking multiple id in parallel to get data for each id from database
for (int i = 0; i < ids.Count; i++)
{
await throttler.WaitAsync().ConfigureAwait(false);
try
{
tasks.Add(Execute(policy, ct => mapper(ct, ids[i])));
}
finally
{
throttler.Release();
}
}
// wait for all id response to come back
var responses = await Task.WhenAll(tasks);
// same excludeNull code check here
return excludeNull;
}
Does Semaphore works on Threads or Tasks? Reading it here looks like Semaphore is for Threads and SemaphoreSlim is for tasks.
Is this correct? If yes then what's the best way to fix this and limit the number of async IO tasks we make to database here.
Task is an abstraction on threads, and doesn’t necessarily create a new thread. Semaphore limits the number of threads that can access that for loop. Execute returns a Task which aren’t threads. If there’s only 1 request, there will be only 1 thread inside that for loop, even if it is asking for 500 ids. The 1 thread sends off all the async IO tasks itself.
Sort of. I would not say that tasks are related to threads at all. There are actually two kinds of tasks: a delegate task (which is kind of an abstraction of a thread), and a promise task (which has nothing to do with threads).
Regarding the SemaphoreSlim, it does limit the concurrency of a block of code (not threads).
I recently started playing with C# so my understanding is not right looks like w.r.t Threads and Tasks.
I recommend reading my async intro and best practices. Follow up with There Is No Thread if you're interested more about how threads aren't really involved.
I modified ExecuteAsync to use Semaphore as shown below but it doesn't look like it does what I want it to do
The current code is only throttling the adding of the tasks to the list, which is only done one at a time anyway. What you want to do is throttle the execution itself:
private async Task<List<T>> ExecuteAsync<T>(IList<int> ids, IPollyPolicy policy, Func<CancellationToken, int, Task<T>> mapper) where T : class
{
var throttler = new SemaphoreSlim(250);
var tasks = new List<Task<T>>(ids.Count);
// invoking multiple id in parallel to get data for each id from database
for (int i = 0; i < ids.Count; i++)
tasks.Add(ThrottledExecute(ids[i]));
// wait for all id response to come back
var responses = await Task.WhenAll(tasks);
// same excludeNull code check here
return excludeNull;
async Task<T> ThrottledExecute(int id)
{
await throttler.WaitAsync().ConfigureAwait(false);
try {
return await Execute(policy, ct => mapper(ct, id)).ConfigureAwait(false);
} finally {
throttler.Release();
}
}
}
Your colleague has probably in mind the Semaphore class, which is indeed a thread-centric throttler, with no asynchronous capabilities.
Limits the number of threads that can access a resource or pool of resources concurrently.
The SemaphoreSlim class is a lightweight alternative to Semaphore, which includes the asynchronous method WaitAsync, that makes all the difference in the world. The WaitAsync doesn't block a thread, it blocks an asynchronous workflow. Asynchronous workflows are cheap (usually less than 1000 bytes each). You can have millions of them "running" concurrently at any given moment. This is not the case with threads, because of the 1 MB of memory that each thread reserves for its stack.
As for the ExecuteAsync method, here is how you could refactor it by using the LINQ methods Select, Where, ToArray and ToList:
Update: The Polly library supports capturing and continuing on the current synchronization context, so I added a bool executeOnCurrentContext
argument to the API. I also renamed the asynchronous Execute method to ExecuteAsync, to be in par with the guidelines.
private async Task<List<T>> ExecuteAsync<T>(IList<int> ids, IPollyPolicy policy,
Func<CancellationToken, int, Task<T>> mapper,
int concurrencyLevel = 1, bool executeOnCurrentContext = false) where T : class
{
var throttler = new SemaphoreSlim(concurrencyLevel);
Task<T>[] tasks = ids.Select(async id =>
{
await throttler.WaitAsync().ConfigureAwait(executeOnCurrentContext);
try
{
return await ExecuteAsync(policy, ct => mapper(ct, id),
executeOnCurrentContext).ConfigureAwait(false);
}
finally
{
throttler.Release();
}
}).ToArray();
T[] results = await Task.WhenAll(tasks).ConfigureAwait(false);
return results.Where(r => r != null).ToList();
}
private async Task<T> ExecuteAsync<T>(IPollyPolicy policy,
Func<CancellationToken, Task<T>> function,
bool executeOnCurrentContext = false) where T : class
{
var response = await policy.Policy.ExecuteAndCaptureAsync(
ct => executeOnCurrentContext ? function(ct) : Task.Run(() => function(ct)),
CancellationToken.None, continueOnCapturedContext: executeOnCurrentContext)
.ConfigureAwait(executeOnCurrentContext);
if (response.Outcome == OutcomeType.Failure)
{
if (response.FinalException != null)
{
ExceptionDispatchInfo.Throw(response.FinalException);
}
}
return response?.Result;
}
You are throttling the rate at which you add tasks to the list. You are not throttling the rate at which tasks are executed. To do that, you'd probably have to implement your semaphore calls inside the Execute method itself.
If you can't modify Execute, another way to do it is to poll for completed tasks, sort of like this:
for (int i = 0; i < ids.Count; i++)
{
var pendingCount = tasks.Count( t => !t.IsCompleted );
while (pendingCount >= 500) await Task.Yield();
tasks.Add(Execute(policy, ct => mapper(ct, ids[i])));
}
await Task.WhenAll( tasks );
Actually the TPL is capable to control the task execution and limit the concurrency. You can test how many parallel tasks is suitable for your use-case. No need to think about threads, TPL will manage everything fine for you.
To use limited concurrency see this answer, credits to #panagiotis-kanavos
.Net TPL: Limited Concurrency Level Task scheduler with task priority?
The example code is (even using different priorities, you can strip that):
QueuedTaskScheduler qts = new QueuedTaskScheduler(TaskScheduler.Default,4);
TaskScheduler pri0 = qts.ActivateNewQueue(priority: 0);
TaskScheduler pri1 = qts.ActivateNewQueue(priority: 1);
Task.Factory.StartNew(()=>{ },
CancellationToken.None,
TaskCreationOptions.None,
pri0);
Just throw all your tasks to the queue and with Task.WhenAll you can wait till everything is done.

C# Add to a List Asynchronously in API

I have an API which needs to be run in a loop for Mass processing.
Current single API is:
public async Task<ActionResult<CombinedAddressResponse>> GetCombinedAddress(AddressRequestDto request)
We are not allowed to touch/modify the original single API. However can be run in bulk, using foreach statement. What is the best way to run this asychronously without locks?
Current Solution below is just providing a list, would this be it?
public async Task<ActionResult<List<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
var combinedAddressResponses = new List<CombinedAddressResponse>();
foreach(AddressRequestDto request in requests)
{
var newCombinedAddress = (await GetCombinedAddress(request)).Value;
combinedAddressResponses.Add(newCombinedAddress);
}
return combinedAddressResponses;
}
Update:
In debugger, it has to go to combinedAddressResponse.Result.Value
combinedAddressResponse.Value = null
and Also strangely, writing combinedAddressResponse.Result.Value gives error below "Action Result does not contain a definition for for 'Value' and no accessible extension method
I'm writing this code off the top of my head without an IDE or sleep, so please comment if I'm missing something or there's a better way.
But effectively I think you want to run all your requests at once (not sequentially) doing something like this:
public async Task<ActionResult<List<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
var combinedAddressResponses = new List<CombinedAddressResponse>(requests.Count);
var tasks = new List<Task<ActionResult<CombinedAddressResponse>>(requests.Count);
foreach (var request in requests)
{
tasks.Add(Task.Run(async () => await GetCombinedAddress(request));
}
//This waits for all the tasks to complete
await tasks.WhenAll(tasks.ToArray());
combinedAddressResponses.AddRange(tasks.Select(x => x.Result.Value));
return combinedAddressResponses;
}
looking for a way to speed things up and run in parallel thanks
What you need is "asynchronous concurrency". I use the term "concurrency" to mean "doing more than one thing at a time", and "parallel" to mean "doing more than one thing at a time using threads". Since you're on ASP.NET, you don't want to use additional threads; you'd want to use a form of concurrency that works asynchronously (which uses fewer threads). So, Parallel and Task.Run should not be parts of your solution.
The way to do asynchronous concurrency is to build a collection of tasks, and then use await Task.WhenAll. E.g.:
public async Task<ActionResult<IReadOnlyList<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
// Build the collection of tasks by doing an asynchronous operation for each request.
var tasks = requests.Select(async request =>
{
var combinedAddressResponse = await GetCombinedAdress(request);
return combinedAddressResponse.Value;
}).ToList();
// Wait for all the tasks to complete and get the results.
var results = await Task.WhenAll(tasks);
return results;
}

How can I asynchronously transform one IEnumerable to another, just like LINQ's Select(), but using await on every transformed item?

Consider this situation:
class Product { }
interface IWorker
{
Task<Product> CreateProductAsync();
}
I am now given an IEnumerable<IWorker> workers and am supposed to create an IEnumerable<Product> from it that I have to pass to some other function that I cannot alter:
void CheckProducts(IEnumerable<Product> products);
This methods needs to have access to the entire IEnumerable<Product>. It is not possible to subdivide it and call CheckProducts on multiple subsets.
One obvious solution is this:
CheckProducts(workers.Select(worker => worker.CreateProductAsync().Result));
But this is blocking, of course, and hence it would only be my last resort.
Syntactically, I need precisely this, just without blocking.
I cannot use await inside of the function I'm passing to Select() as I would have to mark it as async and that would require it to return a Task itself and I would have gained nothing. In the end I need an IEnumerable<Product> and not an IEnumerable<Task<Product>>.
It is important to know that the order of the workers creating their products does matter, their work must not overlap. Otherwise, I would do this:
async Task<IEnumerable<Product>> CreateProductsAsync(IEnumerable<IWorker> workers)
{
var tasks = workers.Select(worker => worker.CreateProductAsync());
return await Task.WhenAll(tasks);
}
But unfortunately, Task.WhenAll() executes some tasks in parallel while I need them executed sequentially.
Here is one possibility to implement it if I had an IReadOnlyList<IWorker> instead of an IEnumerable<IWorker>:
async Task<IEnumerable<Product>> CreateProductsAsync(IReadOnlyList<IWorker> workers)
{
var resultList = new Product[workers.Count];
for (int i = 0; i < resultList.Length; ++i)
resultList[i] = await workers[i].CreateProductAsync();
return resultList;
}
But I must deal with an IEnumerable and, even worse, it is usually quite huge, sometimes it is even unlimited, yielding workers forever. If I knew that its size was decent, I would just call ToArray() on it and use the method above.
The ultimate solution would be this:
async Task<IEnumerable<Product>> CreateProductsAsync(IEnumerable<IWorker> workers)
{
foreach (var worker in workers)
yield return await worker.CreateProductAsync();
}
But yield and await are incompatible as described in this answer. Looking at that answer, would that hypothetical IAsyncEnumerator help me here? Does something similar meanwhile exist in C#?
A summary of the issues I'm facing:
I have a potentially endless IEnumerable<IWorker>
I want to asynchronously call CreateProductAsync() on each of them in the same order as they are coming in
In the end I need an IEnumerable<Product>
A summary of what I already tried, but doesn't work:
I cannot use Task.WhenAll() because it executes tasks in parallel.
I cannot use ToArray() and process that array manually in a loop because my sequence is sometimes endless.
I cannot use yield return because it's incompatible with await.
Does anybody have a solution or workaround for me?
Otherwise I will have to use that blocking code...
IEnumerator<T> is a synchronous interface, so blocking is unavoidable if CheckProducts enumerates the next product before the next worker has finished creating the product.
Nevertheless, you can achieve parallelism by creating products on another thread, adding them to a BlockingCollection<T>, and yielding them on the main thread:
static IEnumerable<Product> CreateProducts(IEnumerable<IWorker> workers)
{
var products = new BlockingCollection<Product>(3);
Task.Run(async () => // On the thread pool...
{
foreach (IWorker worker in workers)
{
Product product = await worker.CreateProductAsync(); // Create products serially.
products.Add(product); // Enqueue the product, blocking if the queue is full.
}
products.CompleteAdding(); // Notify GetConsumingEnumerable that we're done.
});
return products.GetConsumingEnumerable();
}
To avoid unbounded memory consumption, you can optionally specify the capacity of the queue as a constructor argument to BlockingCollection<T>. I used 3 in the code above.
The Situation:
Here you're saying you need to do this synchronously, because IEnumerable doesn't support async and the requirements are you need an IEnumerable<Product>.
I am now given an IEnumerable workers and am supposed to
create an IEnumerable from it that I have to pass to some
other function that I cannot alter:
Here you say the entire product set needs to be processed at the same time, presumably making a single call to void CheckProducts(IEnumerable<Product> products).
This methods needs to check the entire Product set as a whole. It is
not possible to subdivide the result.
And here you say the enumerable can yield an indefinite number of items
But I must deal with an IEnumerable and, even worse, it is usually
quite huge, sometimes it is even unlimited, yielding workers forever.
If I knew that its size was decent, I would just call ToArray() on it
and use the method above.
So lets put these together. You need to do asynchronous processing of an indefinite number of items within a synchronous environment and then evaluate the entire set as a whole... synchronously.
The Underlying Problems:
1: To evaluate a set as a whole, it must be completely enumerated. To completely enumerate a set, it must be finite. Therefore it is impossible to evaluate an infinite set as a whole.
2: Switching back and forth between sync and async forces the async code to run synchronously. that might be ok from a requirements perspective, but from a technical perspective it can cause deadlocks (maybe unavoidable, I don't know. Look that up. I'm not the expert).
Possible Solutions to Problem 1:
1: Force the source to be an ICollection<T> instead of IEnumerable<T>. This enforces finiteness.
2: Alter the CheckProducts algorithm to process iteratively, potentially yielding intermediary results while still maintaining an ongoing aggregation internally.
Possible Solutions to Problem 2:
1: Make the CheckProducts method asynchronous.
2: Make the CreateProduct... method synchronous.
Bottom Line
You can't do what you're asking how you're asking, and it sounds like someone else is dictating your requirements. They need to change some of the requirements, because what they're asking for is (and I really hate using this word) impossible. Is it possible you have misinterpreted some of the requirements?
Two ideas for you OP
Multiple call solution
If you are allowed to call CheckProducts more than once, you could simply do this:
foreach (var worker in workers)
{
var product = await worker.CreateProductAsync();
CheckProducts(new [] { product } );
}
If it adds value, I'm pretty sure you could work out a way to do it in batches of, say, 100 at a time, too.
Thread pool solution
If you are not allowed to call CheckProducts more than once, and not allowed to modify CheckProducts, there is no way to force it to yield control and allow other continuations to run. So no matter what you do, you cannot force asynchronousness into the IEnumerable that you pass to it, not just because of the compiler checking, but because it would probably deadlock.
So here is a thread pool solution. The idea is to create one separate thread to process the products in series; the processor is async, so a call to CreateProductAsync() will still yield control to anything else that has been posted to the synchronization context, as needed. However it can't magically force CheckProduct to give up control, so there is still some possibility that it will block occasionally if it is able to check products faster than they are created. In my example I'm using Monitor.Wait() so the O/S won't schedule the thread until there is something waiting for it. You'll still be using up a thread resource while it blocks, but at least you won't be wasting CPU time in a busy-wait loop.
public static IEnumerable<Product> CreateProducts(IEnumerable<Worker> workers)
{
var queue = new ConcurrentQueue<Product>();
var task = Task.Run(() => ConvertProducts(workers.GetEnumerator(), queue));
while (true)
{
while (queue.Count > 0)
{
Product product;
var ok = queue.TryDequeue(out product);
if (ok) yield return product;
}
if (task.IsCompleted && queue.Count == 0) yield break;
Monitor.Wait(queue, 1000);
}
}
private static async Task ConvertProducts(IEnumerator<Worker> input, ConcurrentQueue<Product> output)
{
while (input.MoveNext())
{
var current = input.Current;
var product = await current.CreateProductAsync();
output.Enqueue(product);
Monitor.Pulse(output);
}
}
From your requirements I can put together the following:
1) Workers processed in order
2) Open to receive new Workers at any time
So using the fact that a dataflow TransformBlock has a built in queue and processes items in order. Now we can accept Workers from the producer at any time.
Next we make the result of the TransformBlockobservale so that the consumer can consume Products on demand.
Made some quick changes and started the consumer portion. This simply takes the observable produced by the Transformer and maps it to an enumerable that yields each product. For background here is the ToEnumerable().
The ToEnumerator operator returns an enumerator from an observable sequence. The enumerator will yield each item in the sequence as it is produced
Source
using System;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;
namespace ClassLibrary1
{
public class WorkerProducer
{
public async Task ProduceWorker()
{
//await ProductTransformer_Transformer.SendAsync(new Worker())
}
}
public class ProductTransformer
{
public IObservable<Product> Products { get; private set; }
public TransformBlock<Worker, Product> Transformer { get; private set; }
private Task<Product> CreateProductAsync(Worker worker) => Task.FromResult(new Product());
public ProductTransformer()
{
Transformer = new TransformBlock<Worker, Product>(wrk => CreateProductAsync(wrk));
Products = Transformer.AsObservable();
}
}
public class ProductConsumer
{
private ThirdParty ThirdParty { get; set; } = new ThirdParty();
private ProductTransformer Transformer { get; set; }
public ProductConsumer()
{
ThirdParty.CheckProducts(Transformer.Products.ToEnumerable());
}
public class Worker { }
public class Product { }
public class ThirdParty
{
public void CheckProducts(IEnumerable<Product> products)
{
}
}
}
Unless I misunterstood something, I don't see why you don't simply do it like this:
var productList = new List<Product>(workers.Count())
foreach(var worker in workers)
{
productList.Add(await worker.CreateProductAsync());
}
CheckProducts(productList);
What about if you simply keep clearing a List of size 1?
var productList = new List<Product>(1);
var checkTask = Task.CompletedTask;
foreach(var worker in workers)
{
await checkTask;
productList.Clear();
productList.Add(await worker.CreateProductAsync());
checkTask = Task.Run(CheckProducts(productList));
}
await checkTask;
You can use Task.WhenAll, but instead of returning result of Task.WhenAll, return collection of tasks transformed to the collection of results.
async Task<IEnumerable<Product>> CreateProductsAsync(IEnumerable<IWorker> workers)
{
var tasks = workers.Select(worker => worker.CreateProductAsync()).ToList();
await Task.WhenAll(tasks);
return tasks.Select(task => task.Result);
}
Order of tasks will be persisted.
And seems like should be ok to go with just return await Task.WhenAll()
From docs of Task.WhenAll Method (IEnumerable>)
The Task.Result property of the returned task will be set to
an array containing all of the results of the supplied tasks in the
same order as they were provided...
If workers need to be executed one by one in the order they were created and based on requirement that another function need whole set of workers results
async Task<IEnumerable<Product>> CreateProductsAsync(IEnumerable<IWorker> workers)
{
var products = new List<product>();
foreach (var worker in workers)
{
product = await worker.CreateProductAsync();
products.Add(product);
}
return products;
}
You can do this now with async, IEnumerable and LINQ but every method in the chain after the async would be a Task<T>, and you need to use something like await Task.WhenAll at the end. You can use async lambdas in the LINQ methods, which return Task<T>. You don't need to wait synchronously in these.
The Select will start your tasks sequentially i.e. they won't even exist as tasks until the select enumerates each one, and won't keep going after you stop enumerating. You could also run your own foreach over the enumerable of tasks if you want to await them all individually.
You can break out of this like any other foreach without it starting all of them, so this will also work on an infinite enumerable.
public async Task Main()
{
// This async method call could also be an async lambda
foreach (var task in GetTasks())
{
var result = await task;
Console.WriteLine($"Result is {result}");
if (result > 5) break;
}
}
private IEnumerable<Task<int>> GetTasks()
{
return GetNumbers().Select(WaitAndDoubleAsync);
}
private async Task<int> WaitAndDoubleAsync(int i)
{
Console.WriteLine($"Waiting {i} seconds asynchronously");
await Task.Delay(TimeSpan.FromSeconds(i));
return i * 2;
}
/// Keeps yielding numbers
private IEnumerable<int> GetNumbers()
{
var i = 0;
while (true) yield return i++;
}
Outputs, the following, then stops:
Waiting 0 seconds asynchronously
Result is 0
Waiting 1 seconds asynchronously
Result is 2
Waiting 2 seconds asynchronously
Result is 4
Waiting 3 seconds asynchronously
Result is 6
The important thing is that you can't mix yield and await in the same method, but you can yield Tasks returned from a method that uses await absolutely fine, so you can use them together just by splitting them into separate methods. Select is already a method that uses yield, so you may not need to write your own method for this.
In your post you were looking for a Task<IEnumerable<Product>>, but what you can actually use is a IEnumerable<Task<Product>>.
You can go even further with this e.g. if you had something like a REST API where one resource can have links to other resources, like if you just wanted to get a list of users of a group, but stop when you found the user you were interested in:
public async Task<IEnumerable<Task<User>>> GetUserTasksAsync(int groupId)
{
var group = await GetGroupAsync(groupId);
return group.UserIds.Select(GetUserAsync);
}
foreach (var task in await GetUserTasksAsync(1))
{
var user = await task;
...
}
There is no solution to your problem. You can't transform a deferred IEnumerable<Task<Product>> to a deferred IEnumerable<Product>, such that the consuming thread will not get blocked while enumerating the IEnumerable<Product>. The IEnumerable<T> is a synchronous interface. It returns an enumerator with a synchronous MoveNext method. The MoveNext returns bool, which is not an awaitable type. An asynchronous interface IAsyncEnumerable<T> exists, whose enumerator has an asynchronous MoveNextAsync method, with a return type of ValueTask<bool>. But you have explicitly said that you can't change the consuming method, so you are stuck with the IEnumerable<T> interface. No solution then.
try
workers.ForEach(async wrkr =>
{
var prdlist = await wrkr.CreateProductAsync();
//Remaing tasks....
});

Use a Task to avoid multiple calls to expensive operation and to cache its result

I have an async method that fetches some data from a database. This operation is fairly expensive, and takes a long time to complete. As a result, I'd like to cache the method's return value. However, it's possible that the async method will be called multiple times before its initial execution has a chance to return and save its result to the cache, resulting in multiple calls to this expensive operation.
To avoid this, I'm currently reusing a Task, like so:
public class DataAccess
{
private Task<MyData> _getDataTask;
public async Task<MyData> GetDataAsync()
{
if (_getDataTask == null)
{
_getDataTask = Task.Run(() => synchronousDataAccessMethod());
}
return await _getDataTask;
}
}
My thought is that the initial call to GetDataAsync will kick off the synchronousDataAccessMethod method in a Task, and any subsequent calls to this method before the Task has completed will simply await the already running Task, automatically avoiding calling synchronousDataAccessMethod more than once. Calls made to GetDataAsync after the private Task has completed will cause the Task to be awaited, which will immediately return the data from its initial execution.
This seems to be working, but I'm having some strange performance issues that I suspect may be tied to this approach. Specifically, awaiting _getDataTask after it has completed takes several seconds (and locks the UI thread), even though the synchronousDataAccessMethod call is not called.
Am I misusing async/await? Is there a hidden gotcha that I'm not seeing? Is there a better way to accomplish the desired behavior?
EDIT
Here's how I call this method:
var result = (await myDataAccessObject.GetDataAsync()).ToList();
Maybe it has something to do with the fact that the result is not immediately enumerated?
If you want to await it further up the call stack, I think you want this:
public class DataAccess
{
private Task<MyData> _getDataTask;
private readonly object lockObj = new Object();
public async Task<MyData> GetDataAsync()
{
lock(lockObj)
{
if (_getDataTask == null)
{
_getDataTask = Task.Run(() => synchronousDataAccessMethod());
}
}
return await _getDataTask;
}
}
Your original code has the potential for this happening:
Thread 1 sees that _getDataTask == null, and begins constructing the task
Thread 2 sees that _getDataTask == null, and begins constructing the task
Thread 1 finishes constructing the task, which starts, and Thread 1 waits on that task
Thread 2 finishes constructing a task, which starts, and Thread 2 waits on that task
You end up with two instances of the task running.
Use the lock function to prevent multiple calls to the database query section. Lock will make it thread safe so that once it has been cached all the other calls will use it instead of running to the database for fulfillment.
lock(StaticObject) // Create a static object so there is only one value defined for this routine
{
if(_getDataTask == null)
{
// Get data code here
}
return _getDataTask
}
Please rewrite your function as:
public Task<MyData> GetDataAsync()
{
if (_getDataTask == null)
{
_getDataTask = Task.Run(() => synchronousDataAccessMethod());
}
return _getDataTask;
}
This should not change at all the things that can be done with this function - you can still await on the returned task!
Please tell me if that changes anything.
Bit late to answer this but there is an open source library called LazyCache that will do this for you in two lines of code and it was recently updated to handle caching Tasks for just this sort of situation. It is also available on nuget.
Example:
Func<Task<List<MyData>>> cacheableAsyncFunc = () => myDataAccessObject.GetDataAsync();
var cachedData = await cache.GetOrAddAsync("myDataAccessObject.GetData", cacheableAsyncFunc);
return cachedData;
// Or instead just do it all in one line if you prefer
// return await cache.GetOrAddAsync("myDataAccessObject.GetData", myDataAccessObject.GetDataAsync);
}
It has built in locking by default so the cacheable method will only execute once per cache miss, and it uses a lamda so you can do "get or add" in one go. It defaults to 20 minutes sliding expiration but you can set whatever caching policy you like on it.
More info on caching tasks is in the api docs and you may find the sample app to demo caching tasks useful.
(Disclaimer: I am the author of LazyCache)

Categories