How does this ConcurrentDictionary + Lazy<Task<T>> code work? - c#

There's various posts/answers that say that the .NET/.NET Core's ConcurrentDictionary GetOrAdd method is not thread-safe when the Func delegate is used to calculate the value to insert into the dictionary, if the key didn't already exist.
I'm under the belief that when using the factory method of a ConcurrentDictionary's GetOrAdd method, it could be called multiple times "at the same time/in really quick succession" if a number of requests occur at the "same time". This could be wasteful, especially if the call is "expensive". (#panagiotis-kanavos explains this better than I). With this assumption, I'm struggling to understand how some sample code I made, seems to work.
I've created a working sample on .NET Fiddle but I'm stuck trying to understand how it works.
A common recommendation suggestion/idea I've read is to have a Lazy<Task<T>> value in the ConcurrentDictionary. The idea is that the Lazy prevents other calls from executing the underlying method.
The main part of the code which does the heavy lifting is this:
public static async Task<DateTime> GetDateFromCache()
{
var result = await _cache.GetOrAdd("someDateTime", new Lazy<Task<DateTime>>(async () =>
{
// NOTE: i've made this method take 2 seconds to run, each time it's called.
var someData = await GetDataFromSomeExternalDependency();
return DateTime.UtcNow;
})).Value;
return result;
}
This is how I read this:
Check if someDateTime key exists in the dictionary.
If yes, return that. <-- That's a thread-safe atomic action. Yay!
If no, then here we go ....
Create an instance of a Lazy<Task<DateTime>> (which is basically instant)
Return that Lazy instance. (so far, the actual 'expensive' operation hasn't been called, yet.)
Now get the Value, which is a Task<DateTime>.
Now await this task .. which finally does the 'expensive' call. It waits 2 seconds .. and then returns the result (some point in Time).
Now this is where I'm all wrong. Because I'm assuming (above) that the value in the key/value is a Lazy<Task<DateTime>> ... which the await would call each time. If the await is called, one at a time (because the Lazy protects other callers from all calling at the same time) then I would have though that the result would a different DateTime with each independent call.
So can someone please explain where I'm wrong in my thinking, please?
(please refer to the full running code in .NET Fiddler).

Because I'm assuming (above) that the value in the key/value is a Lazy<Task<DateTime>>
Yes, that is true.
which the await would call each time. If the await is called, one at a time (because the Lazy protects other callers from all calling at the same time) then I would have though that the result would a different DateTime with each independent call.
await is not a call, it is more like "continue execution when the result is available". Accessing Lazy.Value will create the task, and this will initiate the call to the GetDataFromSomeExternalDependency that eventually returns the DateTime. You can await the task however many times you want and get the same result.

Related

What are best practices / good patterns for managing cached async data?

I am rewriting an old app and I am trying to use async to speed it up.
The old code was doing something like this:
var value1 = getValue("key1");
var value2 = getValue("key2");
var value3 = getValue("key3");
where the getValue function was managing its own cache in a dictionary, doing something like this:
object getValue(string key) {
if (cache.ContainsKey(key)) return cache[key];
var value = callSomeHttpEndPointsAndCalculateTheValue(key);
cache.Add(key, value);
return value;
}
If I make the getValue async and I await every call to getValue, then everything works well. But it is not faster than the old version because everything is running synchronously as it used to.
If I remove the await (well, if I postpone it, but that's not the focus of this question), I finally get the slow stuff to run in parallel. But if a second call to getValue("key1") is executed before the first call has finished, I end up with executing the same slow call twice and everything is slower than the old version, because it doesn't take advantage of the cache.
Is there something like await("key1") that will only await if a previous call with "key1" is still awaiting?
EDIT (follow-up to a comment)
By "speed it up" I mean more responsive.
For example when the user selects a material in a drop down, I want to update the list of available thicknesses or colors in other drop downs and other material properties in other UI elements. Sometimes this triggers a cascade of events that requires the same getValue("key") to used more than once.
For example when the material is changed, a few functions may be called: updateThicknesses(), updateHoleOffsets(), updateMaxWindLoad(), updateMaxHoleDistances(), etc. Each function reads the values from the UI elements and decides whether to do its own slow calculations independently from the other functions. Each function can require a few http calls to calculate some parameters, and some of those parameters may be required by several functions.
The old implementation was calling the functions in sequence, so the second function would take advantage of some values cached while processing the first one. The user would see each section of the interface updating in sequence over 5-6 seconds the first time and very quickly the following times, unless the new value required some new http endpoint calls.
The new async implementation calls all the functions at the same time, so every function ends up calling the same http endpoints because their results are not yet cached.
A simple method is to cache the tasks instead of the values, this way you can await both a pending task and an already completed task to get the values.
If several parallel tasks all try to get a value using the same key, only the first will spin off the task, the others will await the same task.
Here's a simple implementation:
private Dictionary<string, Task<object>> cache = new();
public Task<object> getValueAsync(string key)
{
lock (cache)
{
if (!cache.TryGetValue(key, out var result))
cache[key] = result = callSomeHttpEndPointsAndCalculateTheValueAsync(key);
return result;
}
}
Judging by the comments the following example should probably not be used.
Since [ConcurrentDictionary]() has been mentioned, here's a version using that instead.
private ConcurrentDictionary<string, Task<object>> cache = new();
public Task<object> getValueAsync(string key)
{
return cache.GetOrAdd(key, k => callSomeHttpEndPointsAndCalculateTheValueAsync(k));
}
The method seems simpler and that alone might be grounds for switching to it, but in my experience the ConcurrentDictionary and the other ConcurrentXXX collections seems to have their niche use and seems somewhat more heavyhanded and thus slower for the basic stuff.

Calling Async Method from Sync Method

I see a lot of questions about calling asyc methods from sync ones, like here: How to call asynchronous method from synchronous method in C#?
I understand the solutions however, experimentally, I am seeing that I don't need to add a .Wait() or Task.Run(async () => await MyAsyncMethod()) for it to run synchronously. I have a get method that pulls information from a DB.
My code is below and when I run, it waits at var task = GetItemAsync() and I get the correct result from the DB. I find it hard to believe that the DB call could come back before the code would move on which seems to imply that it ran synchronously. Am I missing something?
public object GetItem()
{
var task = GetItemAsync();
var result = task.Result;
return result
}
See The docs on Task<>.Result
System.Threading.Tasks.Task.Result is a blocking call that waits until the task finishes. So it has a built in wait call essentially.
In the remarks section of that microsoft doc:
Accessing the property's get accessor blocks the calling thread until the asynchronous operation is complete; it is equivalent to calling the Wait method.
It's unfortunate that Microsoft did not reflect this in the name of the accessor because it's an extremely critical side effect of this accessor.
The only right way to call awaitable(method which returns Task) method is to await it. But you can await only within the method which returns Task. In legacy code it happens you cannot change the method signature because you have too many reference and you need to basically refactor entire code base. So you have these options:
.Result
.Wait()
.GetAwaiter().GetResult()
Neither of those are good. All of them are going to block the thread. But if you cannot change the method signature to await the execution, I would pick the last option - GetAwaiter().GetResult() becasue it will not wrap the exception into AggregateException

Does adding a Task<T> to the return of a method make it run in a thread?

I am struggling to find any solid information online, but lets pretend I have a method like below:
public int DoSomething()
{
// some sync logic that takes a while
return someResult
}
So lets assume it is a method that is completely sync and currently will block the thread the caller is on for 100ms. So then lets say I wanted to create a version that would not block that thread while it executes, is it as simple as just doing:
public Task<int> DoSomethingAsync()
{
// some sync logic that takes a while
return Task.FromResult(someResult)
}
As I know Task.Run will cause code to be executed within a thread, but does .net basically interpret any method with a Task/<T> return type to be something that should be non blocking?
(I am hesitant to use the word async as it has mixed meanings in .net, but as shown above there is no await use here)
No. All that changing the return type does is... change the return type.
You're changing your promise from "when I return I will give you an int" to "when I return, I will give you something that will eventually provide an int" (with the caveat that the value may, in fact, already be available).
How your method comes up with the Task is entirely an internal implementation detail that is a concern for your method. In your current changed implementation, it does it by running entirely on the same thread from which it was called and then just stuffing the result into a Task at the end - to no real benefit.
If you had added the async keyword and left the return someResult; line you'd have been given more clues about this by the compiler:
This async method lacks 'await' operators and will run synchronously. Consider using ...

The proper way to await databound property getters c#

What would be the most correct way to use async method in databound property getter? I am talking about solid, scientific arguments, not personal preferences. I've read many threads about the problem, but not this specific case. Some of the solutions don't work in all the cases, and some of the suggestion, well... they were just too subjective or just wrong.
What I don't accept and why:
You can't - Actually, it is possible. There are many posts like "there are no such things like async properties", "it is against the design of the language" etc. but also there are many sensible explanations why such expressions are false
This should me method, not property - It can't be. I want to databind it. I provide property "proxies" for people using this code because in the future there may be different method to calculate this pseudo-property. And I want the View-side of the binding to be simple as possible
Use property to store the cached result of the method - that would defeat the purpose, it is actually something that changes dynamically and the class is an ORM Entity so it would store redundant data to the DB.
Use SomeTask.Result; or SomeTask.GetAwaiter().GetResult() - In most cases I would just use it. I've successfully used those in many cases i.e. Console applications. It's nice, clear and easily readable. But when I use it in databound property I get a deadlock
Problem background (simplified)
Let's say that I am responsible for developing ORM mechanism in a project. There was a first stable version, but now I want to add some properties to the Entities for the DataBinders who are responsible for the layout. I can edit Entity layer, but I can't edit Mapping and Repository layers. (I am not held againt my will, this situation is fictional simplification). All the methods in repositories are async. All I can do is ask someone responsible to provide identical synchronous methods for all of the methods, but it would be stupid to this kind of redundant work.
Only solution I can use now
_something = Task.Run(async () => await AnotherRepository.CalculateStuff(this)).Result;
And it just doesn't look right to me. It works, but I have to await my method inside the lambda in Task.Run(). I am stuck with it for the time being, and I want to know the simplest and correct approach.
Repository method pseudo-code
public async static Task<IList<CalculatedStuff>> CalculateStuff(SomeClass class)
{
return await Task.Run(() =>
{
using (var session = Helper.OpenSession())
return session.CreateCriteria(typeof(CalculatedStuff)).Add(Restrictions.Eq("SomeClass", class))
///...
.List<CalculatedStuff>();
});
}
there are no such things like async properties
I have a blog post and MSDN article on "async properties" for data binding. I do take the stance that they are not natural, which is based on these (objective) observations:
Properties read by data binding must return immediately (synchronously).
Asynchronous operations are asynchronous (that is, they complete after some time).
Clearly, these are at complete odds with one another.
Now, there are a few different solutions, but any solution that attempts to violate one of these observations is going to be dubious, at best.
For example, you can attempt to violate the second observation by trying to run the asynchronous operation synchronously. As you discovered, Result / Wait / GetAwaiter().GetResult() will deadlock (for reasons described in detail on my blog). Task.Run(() => ...).GetAwaiter().GetResult() will avoid the deadlock but will execute the code in a free-threaded context (which is OK for most code but not all). These are two different kinds of sync-over-async; I call them the "Blocking Hack" and the "Thread Pool Hack" in my Async Brownfield article, which also covers two other kinds of sync-over-async patterns.
Unfortunately, there is no solution for sync-over-async that works in every scenario. Even if you get it to work, your users would get a substandard experience (blocking the UI thread for an indefinite amount of time), and you may have problems with app stores (I believe MS's at least will actively check for blocking the UI thread and auto-reject). IMO, sync-over-async is best avoided.
However, we obviously cannot violate the first observation, either. If we're data binding to the result of some asynchronous operation, we can't very well return it before the operation completes!
Or can we?
What if we change what the data binding is attaching to? Say, introduce a property that has a default value before the operation is completed, and changes (via INotifyPropertyChanged) to the result of the operation when the operation completes. That sounds reasonable... And we can stick in another property to indicate to the UI that the operation is in progress! And maybe another one to indicate if the operation failed...
This is the line of thinking that resulted in my NotifyTaskCompletion type in the article on data binding (updated NotifyTask type here). It is essentially a data-bindable wrapper for Task<T>, so the UI can respond dynamically to the asynchronous operation without trying to force it to be synchronous.
This does require some changes to the bindings, but you get a nice side effect that your UX is better (non-blocking).
This should me method, not property
Well, you can do this as a property:
TEntity Entity { get { return NotifyTask.Create(() => Repository.GetEntityAsync()); } }
// Data bind to Entity.Result for the results.
// Data bind to Entity.IsNotCompleted for a busy spinner.
However, I would say that it's surprising behavior to have a property read kick off something significant like a database query or HTTP download. That's a pretty wide definition of "property". IMO, this would be better represented as a method, which connotates action more than a property does (or perhaps as part of an asynchronous initialization, which I also describe on my blog). Put another way: I prefer my properties without side effects. Reading a property more than once and having it return different values is counterintuitive. This final paragraph is entirely my own opinion. :)
If you have access to the source code of AnotherRepository.CalculateStuff, you can implement it in a way that won't deadlock when called from bound property. First short summary of why it deadlocks. When you await something, current synchronization context is remembered and the rest of the method (after async) is executed on that context. For UI applications that means the rest of the method is executed on UI thread. But in your case UI thread is already blocked by waiting for the Result of task - hence deadlock.
But there is method of Task named ConfigureAwait. If you pass false for it's only argument (named continueOnCapturedContext) and await task returned by this method - it won't continue on captured context, which will solve your problem. So suppose you have:
// this is UI bound
public string Data
{
get { return GetData().Result; }
}
static async Task<string> GetData() {
await Task.Run(() =>
{
Thread.Sleep(2000);
});
return "test!";
}
This will deadlock when called from UI thread. But if you change it:
static async Task<string> GetData() {
await Task.Run(() =>
{
Thread.Sleep(2000);
}).ConfigureAwait(false);
return "test!";
}
It won't any more.
For those who might read this later - don't do it this way, only if for temporary debugging purposes. Instead return dummy object from your property getter with some IsLoading flag set to true, and meanwhile load data in background and fill dummy object properties when done. This will not freeze your UI during long blocking operation.

Returning result of async method without awaiting - bad idea?

I'm looking at some code written a while back that is making me very nervous. The general shape of the methods in questions is like this;
public Task Foo(...){
SyncMethod();
SyncMethod();
...
return AsyncMethod();
}
I realize I can just mark the method async and do an await on the last call, but...do I have to? Is this safe to use as is? The sync methods that are called before the last async method do not have asynchronous alternatives and the Foo method does not care about the result of the AsyncMethod, so it looks like it's ok to just return the awaitable to the caller and let the caller deal with it.
Also, FWIW, the code in question might be called from a thread that's running with an active ASP.NET context, so I'm getting that tingling feeling in my brain that there's some invisible evil that I'm not seeing here?
I realize I can just mark the method async and do an await on the last call, but...do I have to?
Actually, if you did this, the only change that it would have is that exceptions thrown by either of those synchronous methods would be wrapped into the returned Task, while in the current implementation the method would throw without ever successfully returning a Task. The actual effect of what work is done synchronously and what work is done asynchronously is entirely unaffected.
Having said that, both of the options you've mentioned are worrisome. Here you have a method that appears to be asynchronous, meaning someone calling it expects it to return more or less right away, but in reality the method will run synchronously for some amount of time.
If your two synchronous methods are really fast and as a result, you're confident that this very small amount of synchronous work won't be noticeable to any of your callers, then that's fine. If, however, that work will (even potentially) take a non-trivial amount of time to solve, then you have a problem on your hands.
Having actually asynchronous alternatives for those methods would be best, but as a last resort, until you have a better option, often the best you can manage to do is to await Task.Run(() => SyncMethod()); for those methods (which of course means the method now needs to be marked as async).

Categories