What are best practices / good patterns for managing cached async data? - c#

I am rewriting an old app and I am trying to use async to speed it up.
The old code was doing something like this:
var value1 = getValue("key1");
var value2 = getValue("key2");
var value3 = getValue("key3");
where the getValue function was managing its own cache in a dictionary, doing something like this:
object getValue(string key) {
if (cache.ContainsKey(key)) return cache[key];
var value = callSomeHttpEndPointsAndCalculateTheValue(key);
cache.Add(key, value);
return value;
}
If I make the getValue async and I await every call to getValue, then everything works well. But it is not faster than the old version because everything is running synchronously as it used to.
If I remove the await (well, if I postpone it, but that's not the focus of this question), I finally get the slow stuff to run in parallel. But if a second call to getValue("key1") is executed before the first call has finished, I end up with executing the same slow call twice and everything is slower than the old version, because it doesn't take advantage of the cache.
Is there something like await("key1") that will only await if a previous call with "key1" is still awaiting?
EDIT (follow-up to a comment)
By "speed it up" I mean more responsive.
For example when the user selects a material in a drop down, I want to update the list of available thicknesses or colors in other drop downs and other material properties in other UI elements. Sometimes this triggers a cascade of events that requires the same getValue("key") to used more than once.
For example when the material is changed, a few functions may be called: updateThicknesses(), updateHoleOffsets(), updateMaxWindLoad(), updateMaxHoleDistances(), etc. Each function reads the values from the UI elements and decides whether to do its own slow calculations independently from the other functions. Each function can require a few http calls to calculate some parameters, and some of those parameters may be required by several functions.
The old implementation was calling the functions in sequence, so the second function would take advantage of some values cached while processing the first one. The user would see each section of the interface updating in sequence over 5-6 seconds the first time and very quickly the following times, unless the new value required some new http endpoint calls.
The new async implementation calls all the functions at the same time, so every function ends up calling the same http endpoints because their results are not yet cached.

A simple method is to cache the tasks instead of the values, this way you can await both a pending task and an already completed task to get the values.
If several parallel tasks all try to get a value using the same key, only the first will spin off the task, the others will await the same task.
Here's a simple implementation:
private Dictionary<string, Task<object>> cache = new();
public Task<object> getValueAsync(string key)
{
lock (cache)
{
if (!cache.TryGetValue(key, out var result))
cache[key] = result = callSomeHttpEndPointsAndCalculateTheValueAsync(key);
return result;
}
}
Judging by the comments the following example should probably not be used.
Since [ConcurrentDictionary]() has been mentioned, here's a version using that instead.
private ConcurrentDictionary<string, Task<object>> cache = new();
public Task<object> getValueAsync(string key)
{
return cache.GetOrAdd(key, k => callSomeHttpEndPointsAndCalculateTheValueAsync(k));
}
The method seems simpler and that alone might be grounds for switching to it, but in my experience the ConcurrentDictionary and the other ConcurrentXXX collections seems to have their niche use and seems somewhat more heavyhanded and thus slower for the basic stuff.

Related

How does this ConcurrentDictionary + Lazy<Task<T>> code work?

There's various posts/answers that say that the .NET/.NET Core's ConcurrentDictionary GetOrAdd method is not thread-safe when the Func delegate is used to calculate the value to insert into the dictionary, if the key didn't already exist.
I'm under the belief that when using the factory method of a ConcurrentDictionary's GetOrAdd method, it could be called multiple times "at the same time/in really quick succession" if a number of requests occur at the "same time". This could be wasteful, especially if the call is "expensive". (#panagiotis-kanavos explains this better than I). With this assumption, I'm struggling to understand how some sample code I made, seems to work.
I've created a working sample on .NET Fiddle but I'm stuck trying to understand how it works.
A common recommendation suggestion/idea I've read is to have a Lazy<Task<T>> value in the ConcurrentDictionary. The idea is that the Lazy prevents other calls from executing the underlying method.
The main part of the code which does the heavy lifting is this:
public static async Task<DateTime> GetDateFromCache()
{
var result = await _cache.GetOrAdd("someDateTime", new Lazy<Task<DateTime>>(async () =>
{
// NOTE: i've made this method take 2 seconds to run, each time it's called.
var someData = await GetDataFromSomeExternalDependency();
return DateTime.UtcNow;
})).Value;
return result;
}
This is how I read this:
Check if someDateTime key exists in the dictionary.
If yes, return that. <-- That's a thread-safe atomic action. Yay!
If no, then here we go ....
Create an instance of a Lazy<Task<DateTime>> (which is basically instant)
Return that Lazy instance. (so far, the actual 'expensive' operation hasn't been called, yet.)
Now get the Value, which is a Task<DateTime>.
Now await this task .. which finally does the 'expensive' call. It waits 2 seconds .. and then returns the result (some point in Time).
Now this is where I'm all wrong. Because I'm assuming (above) that the value in the key/value is a Lazy<Task<DateTime>> ... which the await would call each time. If the await is called, one at a time (because the Lazy protects other callers from all calling at the same time) then I would have though that the result would a different DateTime with each independent call.
So can someone please explain where I'm wrong in my thinking, please?
(please refer to the full running code in .NET Fiddler).
Because I'm assuming (above) that the value in the key/value is a Lazy<Task<DateTime>>
Yes, that is true.
which the await would call each time. If the await is called, one at a time (because the Lazy protects other callers from all calling at the same time) then I would have though that the result would a different DateTime with each independent call.
await is not a call, it is more like "continue execution when the result is available". Accessing Lazy.Value will create the task, and this will initiate the call to the GetDataFromSomeExternalDependency that eventually returns the DateTime. You can await the task however many times you want and get the same result.

ConcurrentDictionary GetOrAdd async

I want to use something like GetOrAdd with a ConcurrentDictionary as a cache to a webservice. Is there an async version of this dictionary? GetOrAdd will be making a web request using HttpClient, so it would be nice if there was a version of this dictionary where GetOrAdd was async.
To clear up some confusion, the contents of the dictionary will be the response from a call to a webservice.
ConcurrentDictionary<string, Response> _cache
= new ConcurrentDictionary<string, Response>();
var response = _cache.GetOrAdd("id",
(x) => { _httpClient.GetAsync(x).GetAwaiter().GetResponse(); });
GetOrAdd won't become an asynchronous operation because accessing the value of a dictionary isn't a long running operation.
What you can do however is simply store tasks in the dictionary, rather than the materialized result. Anyone needing the results can then await that task.
However, you also need to ensure that the operation is only ever started once, and not multiple times. To ensure that some operation runs only once, and not multiple times, you also need to add in Lazy:
ConcurrentDictionary<string, Lazy<Task<Response>>> _cache = new ConcurrentDictionary<string, Lazy<Task<Response>>>();
var response = await _cache.GetOrAdd("id", url => new Lazy<Task<Response>>(_httpClient.GetAsync(url))).Value;
The GetOrAdd method is not that great to use for this purpose. Since it does not guarantee that the factory runs only once, the only purpose it has is a minor optimization (minor since additions are rare anyway) in that it doesn't need to hash and find the correct bucket twice (which would happen twice if you get and set with two separate calls).
I would suggest that you check the cache first, if you do not find the value in the cache, then enter some form of critical section (lock, semaphore, etc.), re-check the cache, if still missing then fetch the value and insert into the cache.
This ensures that your backing store is only hit once; even if multiple requests get a cache miss at the same time, only the first one will actually fetch the value, the other requests will await the semaphore and then return early since they re-check the cache in the critical section.
Psuedo code (using SemaphoreSlim with count of 1, since you can await it asynchronously):
async Task<TResult> GetAsync(TKey key)
{
// Try to fetch from catch
if (cache.TryGetValue(key, out var result)) return result;
// Get some resource lock here, for example use SemaphoreSlim
// which has async wait function:
await semaphore.WaitAsync();
try
{
// Try to fetch from cache again now that we have entered
// the critical section
if (cache.TryGetValue(key, out result)) return result;
// Fetch data from source (using your HttpClient or whatever),
// update your cache and return.
return cache[key] = await FetchFromSourceAsync(...);
}
finally
{
semaphore.Release();
}
}
Try this extension method:
/// <summary>
/// Adds a key/value pair to the <see cref="ConcurrentDictionary{TKey, TValue}"/> by using the specified function
/// if the key does not already exist. Returns the new value, or the existing value if the key exists.
/// </summary>
public static async Task<TResult> GetOrAddAsync<TKey,TResult>(
this ConcurrentDictionary<TKey,TResult> dict,
TKey key, Func<TKey,Task<TResult>> asyncValueFactory)
{
if (dict.TryGetValue(key, out TResult resultingValue))
{
return resultingValue;
}
var newValue = await asyncValueFactory(key);
return dict.GetOrAdd(key, newValue);
}
Instead of dict.GetOrAdd(key,key=>something(key)), you use await dict.GetOrAddAsync(key,async key=>await something(key)). Obviously, in this situation you just write it as await dict.GetOrAddAsync(key,something), but I wanted to make it clear.
In regards to concerns about preserving the order of operations, I have the following observations:
Using the normal GetOrAdd will get the same effect if you look at the way it is implemented. I literally used the same code and made it work for async. Reference says
the valueFactory delegate is called outside the locks to avoid the
problems that can arise from executing unknown code under a lock.
Therefore, GetOrAdd is not atomic with regards to all other operations
on the ConcurrentDictionary<TKey,TValue> class
SyncRoot is not supported in ConcurrentDictionary, they use an internal locking mechanism, so locking on it is not possible. Using your own lock mechanism works only for this extension method, though. If you use another flow (using GetOrAdd for example) you will face the same problem.
Probably using a dedicated memory cache with advanced asynchronous capabilities, like the LazyCache by Alastair Crabtree, would be preferable to using a simple ConcurrentDictionary<K,V>. You would get commonly needed functionality like time-based expiration, or automatic eviction of entries that are dependent on other entries that have expired, or are dependent on mutable external resources (like files, databases etc). These features are not trivial to implement manually.
Below is a custom extension method GetOrAddAsync for ConcurrentDictionarys that have Task<TValue> values. It accepts a factory method, and ensures that the method will be invoked at most once. It also ensures that failed tasks are removed from the dictionary.
/// <summary>
/// Returns an existing task from the concurrent dictionary, or adds a new task
/// using the specified asynchronous factory method. Concurrent invocations for
/// the same key are prevented, unless the task is removed before the completion
/// of the delegate. Failed tasks are evicted from the concurrent dictionary.
/// </summary>
public static Task<TValue> GetOrAddAsync<TKey, TValue>(
this ConcurrentDictionary<TKey, Task<TValue>> source, TKey key,
Func<TKey, Task<TValue>> valueFactory)
{
ArgumentNullException.ThrowIfNull(source);
ArgumentNullException.ThrowIfNull(valueFactory);
Task<TValue> currentTask;
if (source.TryGetValue(key, out currentTask))
return currentTask;
Task<Task<TValue>> newTaskTask = new(() => valueFactory(key));
Task<TValue> newTask = null;
newTask = newTaskTask.Unwrap().ContinueWith(task =>
{
if (!task.IsCompletedSuccessfully)
source.TryRemove(KeyValuePair.Create(key, newTask));
return task;
}, default, TaskContinuationOptions.DenyChildAttach |
TaskContinuationOptions.ExecuteSynchronously,
TaskScheduler.Default).Unwrap();
currentTask = source.GetOrAdd(key, newTask);
if (ReferenceEquals(currentTask, newTask))
newTaskTask.RunSynchronously(TaskScheduler.Default);
return currentTask;
}
This method is implemented using the Task constructor for creating a cold Task, that is started only if it is added successfully in the dictionary. Otherwise, if another thread wins the race to add the same key, the cold task is discarded. The advantage of using this technique over the simpler Lazy<Task> is that in case the valueFactory blocks the current thread, it won't block also other threads that are awaiting for the same key. The same technique can be used for implementing an AsyncLazy<T> or an AsyncExpiringLazy<T> class.
Usage example:
ConcurrentDictionary<string, Task<JsonDocument>> cache = new();
JsonDocument document = await cache.GetOrAddAsync("https://example.com", async url =>
{
string content = await _httpClient.GetStringAsync(url);
return JsonDocument.Parse(content);
});
Overload with synchronous valueFactory delegate:
public static Task<TValue> GetOrAddAsync<TKey, TValue>(
this ConcurrentDictionary<TKey, Task<TValue>> source, TKey key,
Func<TKey, TValue> valueFactory)
{
ArgumentNullException.ThrowIfNull(valueFactory);
return source.GetOrAddAsync(key, key => Task.FromResult<TValue>(valueFactory(key)));
}
Both overloads invoke the valueFactory delegate on the current thread.
If you have some reason to prefer invoking the delegate on the ThreadPool, you can just replace the RunSynchronously with the Start.
For a version of the GetOrAddAsync method that compiles on .NET versions older than .NET 6, you can look at the 3rd revision of this answer.
I solved this years ago before ConcurrentDictionary and the TPL was born. I'm in a café and don't have that original code but it went something like this.
It's not a rigorous answer but may inspire your own solution. The important thing is to return the value that was just added or exists already along with the boolean so you can fork execution.
The design lets you easily fork the race winning logic vs. the losing logic.
public bool TryAddValue(TKey key, TValue value, out TValue contains)
{
// guards etc.
while (true)
{
if (this.concurrentDic.TryAdd(key, value))
{
contains = value;
return true;
}
else if (this.concurrentDic.TryGetValue(key, out var existing))
{
contains = existing;
return false;
}
else
{
// Slipped down the rare path. The value was removed between the
// above checks. I think just keep trying because we must have
// been really unlucky.
// Note this spinning will cause adds to execute out of
// order since a very unlucky add on a fast moving collection
// could in theory be bumped again and again before getting
// lucky and getting its value added, or locating existing.
// A tiny random sleep might work. Experiment under load.
}
}
}
This could be made into an extension for ConcurrentDictionary or be a method on its own your own cache or something using locks.
Perhaps a GetOrAdd(K,V) could be used with an Object.ReferenceEquals() to check if it was added or not, instead of the spin design.
To be honest, the above code isn't the point of my answer. The power comes in the simple design of the method signature and how it affords the following:
static readonly ConcurrentDictionary<string, Task<Task<Thing>>> tasks = new();
//
var newTask = new Task<Task<Thing>>(() => GetThingAsync(thingId));
if (this.tasks.TryAddValue(thingId, newTask, out var task))
{
task.Start();
}
var thingTask = await task;
var thing = await thingTask;
It's a little quirky how a Task needs to hold a Task (if your work is async), and there's the allocations of unused Tasks to consider.
I think it's a shame Microsoft didn't ship its thread-safe collection with this method, or extract a "concurrent collection" interface.
My real implementation was a cache with sophisticated expiring inner collections and stuff. I guess you could subclass the .NET Task class and add a CreatedAt property to aid with eviction.
Disclaimer I've not tried this at all, it's off top of head, but I used this sort of design in an ultra-hi thru-put app in 2009.

System.Reactive Throttling an async method

I have been putting off using reactive extensions for so long, and I thought this would be a good use. Quite simply, I have a method that can be called for various reasons on various code paths
private async Task GetProductAsync(string blah) {...}
I need to be able to throttle this method. That's to say, I want to stop the flow of calls until no more calls are made (for a specified period of time). Or more clearly, if 10 calls to this method happen within a certain time period, i want to limit (throttle) it to only 1 call (after a period) when the last call was made.
I can see an example using a method with IEnumerable, this kind of makes sense
static IEnumerable<int> GenerateAlternatingFastAndSlowEvents()
{ ... }
...
var observable = GenerateAlternatingFastAndSlowEvents().ToObservable().Timestamp();
var throttled = observable.Throttle(TimeSpan.FromMilliseconds(750));
using (throttled.Subscribe(x => Console.WriteLine("{0}: {1}", x.Value, x.Timestamp)))
{
Console.WriteLine("Press any key to unsubscribe");
Console.ReadKey();
}
Console.WriteLine("Press any key to exit");
Console.ReadKey();
However, (and this has always been my major issue with Rx, forever), how do I create an Observable from a simple async method.
Update
I have managed to find an alternative approach using ReactiveProperty
Barcode = new ReactiveProperty<string>();
Barcode.Select(text => Observable.FromAsync(async () => await GetProductAsync(text)))
.Throttle(TimeSpan.FromMilliseconds(1000))
.Switch()
.ToReactiveProperty();
The premise is I catch it at the text property Barcode, however it has its own drawbacks, as ReactiveProperty takes care of notification, and I cant silently update the backing field as its already managed.
To summarise, how can I convert an async method call to Observable, so I can user the Throttle method?
Unrelated to your question, but probably helpful: Rx's Throttle operator is really a debounce operator. The closest thing to a throttling operator is Sample. Here's the difference (assuming you want to throttle or debounce to one item / 3 seconds):
items : --1-23----4-56-7----8----9-
throttle: --1--3-----4--6--7--8-----9
debounce: --1-------4--6------8----9-
Sample/throttle will bunch items that arrive in the sensitive time and emit the last one on the next sampling tick. Debounce throws away items that arrive in the sensitive time, then re-starts the clock: The only way for an item to emit is if it was preceded by Time-Range of silence.
RX.Net's Throttle operator does what debounce above depicts. Sample does what throttle above depicts.
If you want something different, describe how you want to throttle.
There are two key ways of converting a Task to an Observable, with an important difference between them.
Observable.FromAsync(()=>GetProductAsync("test"));
and
GetProductAsync("test").ToObservable();
The first will not start the Task until you subscribe to it.
The second will create (and start) the task and the result will either immediately or sometime later appear in the observable, depending on how fast the Task is.
Looking at your question in general though, it seems that you want to stop the flow of calls. You do not want to throttle the flow of results, which would result in unnecessary computation and loss.
If this is your aim, your GetProductAsync could be seen as an observer of call events, and the GetProductAsync should throttle those calls. One way of achieving that would be to declare a
public event Action<string> GetProduct;
and use
var callStream= Observable.FromEvent<string>(
handler => GetProduct+= handler ,
handler => GetProduct-= handler);
The problem then becomes how to return the result and what should happen when your 'caller's' call is throttled out and discarded.
One approach there could be to declare a type "GetProductCall" which would have the input string and output result as properties.
You could then have a setup like:
var callStream= Observable.FromEvent<GetProductCall>(
handler => GetProduct+= handler ,
handler => GetProduct-= handler)
.Throttle(...)
.Select(r=>async r.Result= await GetProductCall(r.Input).ToObservable().FirstAsync());
(code not tested, just illustrative)
Another approach might include the Merge(N) overload that limits the max number of concurrent observables.

Task.Run using custom thread pool

I have to address a temporary situation that requires me to do a non-ideal thing: I have to call an async method from inside a sync one.
Let me just say here that I know all about the problems I'm getting myself into and I understand reasons why this is not advised.
That said, I'm dealing with a large codebase, which is completely sync from top to bottom and there is no way I can rewrite everything to use async await in a reasonable amount of time. But I do need to rewrite a number of small parts of this codebase to use the new async API that I'be been slowly developing over the last year or so, because it has a lot of new features that the old codebase would benefit from as well, but can't get them for legacy reasons. And since all that code isn't going away any time soon, I'm facing a problem.
TL;DR: A large sync codebase cannot be easily rewritten to support async but now requires calls into another large codebase, which is completely async.
I'm currently doing the simplest thing that works in the sync codebase: wrapping each async call into a Task.Run and waiting for the Result in a sync way.
The problem with this approach is, that it becomes slow whenever sync codebase does this in a tight loop. I understand the reasons and I sort of know what I can do - I'd have to make sure that all async calls are started on the same thread instead of borrowing a new one each time from the thread pool (which is what Task.Run does). This borrowing and returning incurs a lot of switching which can slow things down considerably if done a lot.
What are my options, short of writing my own scheduler that would prefer to reuse a single dedicated thread?
UPDATE: To better illustrate what I'm dealing with, I offer an example of one of the simplest transformations I need to do (there are more complex ones as well).
It's basically simple LINQ query that uses a custom LINQ provider under the hood. There's no EF or anything similar underneath.
[Old code]
var result = (from c in syncCtx.Query("Components")
where c.Property("Id") == id
select c).SingleOrDefault();
[New code]
var result = Task.Run(async () =>
{
Dictionary<string, object> data;
using (AuthorizationManager.Instance.AuthorizeAsInternal())
{
var uow = UnitOfWork.Current;
var source = await uow.Query("Components")
.Where("Id = #id", new { id })
.PrepareAsync();
var single = await source.SingleOrDefaultAsync();
data = single.ToDictionary();
}
return data;
}).Result;
As mentioned, this is one of the less complicated examples and it already contains 2 async calls.
UPDATE 2: I tried removing the Task.Run and invoking .Result directly on the result of a wrapper async method, as suggested by #Evk and #Manu. Unfortunately, while testing this in my staging environment, I quickly ran into a deadlock. I'm still trying to understand what exactly transpired, but it's obvious that Task.Run cannot simply be removed in my case. There are additional complications to be resolved, first...
I don't think you are on the right track. Wrapping every async call in a Task.Run seems horrible to me, it always starts an additional tasks which you don't need. But I understand that introducing async/await in a large codebase can be problematic.
I see a possible solution: Extract all async calls into separate, async methods. This way, your project will have a pretty nice transition from sync to async, since you can change methods one by one without affecting other parts of the code.
Something like this:
private Dictionary<string, object> GetSomeData(string id)
{
var syncCtx = GetContext();
var result = (from c in syncCtx.Query("Components")
where c.Property("Id") == id
select c).SingleOrDefault();
DoSomethingSyncWithResult(result);
return result;
}
would become something like this:
private Dictionary<string, object> GetSomeData(string id)
{
var result = FetchComponentAsync(id).Result;
DoSomethingSyncWithResult(result);
return result;
}
private async Task<Dictionary<string, object>> FetchComponentAsync(int id)
{
using (AuthorizationManager.Instance.AuthorizeAsInternal())
{
var uow = UnitOfWork.Current;
var source = await uow.Query("Components")
.Where("Id = #id", new { id })
.PrepareAsync();
var single = await source.SingleOrDefaultAsync();
return single.ToDictionary();
}
}
Since you are in a Asp.Net environment, mixing sync with async is a very bad idea. I'm surprised that your Task.Run solution works for you. The more you incorporate the new async codebase into the old sync codebase, the more you will run into problems and there is no easy fix for that, except rewriting everything in an async way.
I strongly suggest you to not mix your async parts into the sync codebase. Instead, work from "bottom to top", change everything from sync to async where you need to await an async call. It may seem like a lot of work, but the benefits are much higher than if you search for some "hacks" now and don't fix the underlining problems.

Changing each element in a list with differents threads

I have the following pseudo-code:
public void Associar(List<Data> dados)
{
List<Task> tasks = new List<Task>();
foreach(dado in dados)
{
tasks.Add(AdicionarAsync(dado));
}
Task.WaitAll(tasks.ToArray());
Debug.WriteLine(dados.Select(e => e.Colecao).Sum(e => e.Count));
}
public async Task<List<Foo>> ConsultarNoBanco()
{
//make request
//here the result is OK
return result;
}
public async Task AdicionarAsync(Data dado)
{
dado.Colecao = await ConsultarNoBanco();
//Here the result (dado.Colecao) is wrong
//If I modify the code to ConsultarNoBanco().Result everything works fine
}
The output of this code must always be 411. However, the result changes each time the method Associar() is called. What is the best way to use a thread safe list to change each item in a collection with multi-thread?
Use Parallel.ForEach(); to modify entries in your list.
It will manage concurrency and threading for you.
You can also break by using e.Break();
The current answer(s)/comment(s) are saying something that you need to manage concurrency since you are modifying entries in your list using tasks/threading. IMHO this is incorrect since the modifications that are being done on your Data object are fine - each task is only modifying the designated Data object. No synchronization is necessary at that point.
On the other hand, your method ConsultarNoBanco is being executed from multiple tasks/threads, at the same time. Since you are not showing the code in the method, we cannot say anything about it. But it is my impression that this method is not thread-safe. Especially since the method is not receiving the Data object, and I can therefore only assume it is doing something not related to the Data object.
Can you show the code of the method ConsultarNoBanco? Is it thread-safe? You mention a request, is the request-handler thread-safe?

Categories