Is ConcurrentDictionary.GetOrAdd truly thread-safe? - c#

I have this piece of code where I want to await on a ongoing task if that task was created for the same input. Here is minimal reproduction of what I'm doing.
private static ConcurrentDictionary<int, Task<int>> _tasks = new ConcurrentDictionary<int, Task<int>>();
private readonly ExternalService _service;
public async Task SampleTask(){
var result = await _service.DoSomething();
await Task.Delay(1000) //this task takes some time do finish
return result;
}
public async Task<int> DoTask(int key) {
var task = _tasks.GetOrAdd(key, _ => SampleTask());
var taskResult = await task;
_tasks.TryRemove(key, out task);
return taskResult;
}
I'm writing a test to ensure the same task is awaited when multiple requests want to perform the task at (roughly) the same time. I'm doing that by mocking _service and counting how many times _service.DoSomething() is being called. It should be only once if the calls to DoTask(int key) where made at roughly the same time.
However, the results show me that if I call DoTask(int key) more than once with a delay between calls of less than 1~2ms, both tasks will create and execute its on instance of SampleTask() with the second one replacing the first one in the dictionary.
Considering this, can we say that this method is truly thread-safe? Or isn't my problem a case of thread-safety per se?

To quote the documentation (emphasis mine):
For modifications and write operations to the dictionary, ConcurrentDictionary<TKey,TValue> uses fine-grained locking to ensure thread safety. (Read operations on the dictionary are performed in a lock-free manner.) However, the valueFactory delegate is called outside the locks to avoid the problems that can arise from executing unknown code under a lock. Therefore, GetOrAdd is not atomic with regards to all other operations on the ConcurrentDictionary<TKey,TValue> class.
Since a key/value can be inserted by another thread while valueFactory is generating a value, you cannot trust that just because valueFactory executed, its produced value will be inserted into the dictionary and returned. If you call GetOrAdd simultaneously on different threads, valueFactory may be called multiple times, but only one key/value pair will be added to the dictionary.
So while the dictionary is properly thread-safe, calls to the valueFactory, or _ => SampleTask() in your case, are not guaranteed to be unique. So your factory function should be able to live with that fact.
You can confirm this from the source:
public TValue GetOrAdd(TKey key, Func<TKey, TValue> valueFactory)
{
if (key == null) throw new ArgumentNullException("key");
if (valueFactory == null) throw new ArgumentNullException("valueFactory");
TValue resultingValue;
if (TryGetValue(key, out resultingValue))
{
return resultingValue;
}
TryAddInternal(key, valueFactory(key), false, true, out resultingValue);
return resultingValue;
}
As you can see, valueFactory is being called outside of TryAddInternal which is responsible of locking the dictionary properly.
However, since valueFactory is a lambda function that returns a task in your case (_ => SampleTask()), and the dictionary will not await that task itself, the function will finish quickly and just return the incomplete Task after encountering the first await (when the async state machine is set up). So unless the calls are very quickly after another, the task should be added very quickly to the dictionary and subsequent calls will reuse the same task.
If you require this to happen just once in all cases, you should consider locking on the task creation yourself. Since it will finish quickly (regardless of how long your task actually takes to resolve), locking will not hurt that much.

Related

What are best practices / good patterns for managing cached async data?

I am rewriting an old app and I am trying to use async to speed it up.
The old code was doing something like this:
var value1 = getValue("key1");
var value2 = getValue("key2");
var value3 = getValue("key3");
where the getValue function was managing its own cache in a dictionary, doing something like this:
object getValue(string key) {
if (cache.ContainsKey(key)) return cache[key];
var value = callSomeHttpEndPointsAndCalculateTheValue(key);
cache.Add(key, value);
return value;
}
If I make the getValue async and I await every call to getValue, then everything works well. But it is not faster than the old version because everything is running synchronously as it used to.
If I remove the await (well, if I postpone it, but that's not the focus of this question), I finally get the slow stuff to run in parallel. But if a second call to getValue("key1") is executed before the first call has finished, I end up with executing the same slow call twice and everything is slower than the old version, because it doesn't take advantage of the cache.
Is there something like await("key1") that will only await if a previous call with "key1" is still awaiting?
EDIT (follow-up to a comment)
By "speed it up" I mean more responsive.
For example when the user selects a material in a drop down, I want to update the list of available thicknesses or colors in other drop downs and other material properties in other UI elements. Sometimes this triggers a cascade of events that requires the same getValue("key") to used more than once.
For example when the material is changed, a few functions may be called: updateThicknesses(), updateHoleOffsets(), updateMaxWindLoad(), updateMaxHoleDistances(), etc. Each function reads the values from the UI elements and decides whether to do its own slow calculations independently from the other functions. Each function can require a few http calls to calculate some parameters, and some of those parameters may be required by several functions.
The old implementation was calling the functions in sequence, so the second function would take advantage of some values cached while processing the first one. The user would see each section of the interface updating in sequence over 5-6 seconds the first time and very quickly the following times, unless the new value required some new http endpoint calls.
The new async implementation calls all the functions at the same time, so every function ends up calling the same http endpoints because their results are not yet cached.
A simple method is to cache the tasks instead of the values, this way you can await both a pending task and an already completed task to get the values.
If several parallel tasks all try to get a value using the same key, only the first will spin off the task, the others will await the same task.
Here's a simple implementation:
private Dictionary<string, Task<object>> cache = new();
public Task<object> getValueAsync(string key)
{
lock (cache)
{
if (!cache.TryGetValue(key, out var result))
cache[key] = result = callSomeHttpEndPointsAndCalculateTheValueAsync(key);
return result;
}
}
Judging by the comments the following example should probably not be used.
Since [ConcurrentDictionary]() has been mentioned, here's a version using that instead.
private ConcurrentDictionary<string, Task<object>> cache = new();
public Task<object> getValueAsync(string key)
{
return cache.GetOrAdd(key, k => callSomeHttpEndPointsAndCalculateTheValueAsync(k));
}
The method seems simpler and that alone might be grounds for switching to it, but in my experience the ConcurrentDictionary and the other ConcurrentXXX collections seems to have their niche use and seems somewhat more heavyhanded and thus slower for the basic stuff.

Is this immutable object threadsafe?

I have a class which loads some data from a server and transforms it. The class contains a method that reloads this data from the server.
I'm not sure if the reload is threadsafe, but I read that i might need to add a volatile keyword or using locks.
public class Tenants : ITenants
{
private readonly string url = "someurl";
private readonly IHttpClientFactory httpClientFactory;
private ConfigParser parser;
public Tenants(IHttpClientFactory httpClientFactory)
{
this.httpClientFactory = httpClientFactory;
}
public async Task Refresh()
{
TConfig data = await ConfigLoader.GetData(httpClientFactory.CreateClient(), url);
parser = new ConfigParser(data);
}
public async Task<TConfig> GetSettings(string name)
{
if (parser == null)
await Refresh();
return parser.GetSettings(name);
}
}
public class ConfigParser
{
private readonly ImmutableDictionary<string, TConfig> configs;
public ConfigParser(TConfig[] configs)
{
this.configs = configs.ToImmutableDictionary(s => s.name, v => v);
}
public TConfig GetSettings(string name)
{
if (!configs.ContainsKey(name))
{
return null;
}
return configs[name];
}
}
The Tenants class will be injected as a singleton intoother classes via DI IOC.
I think that this design makes this threadsafe.
It is fully atomic, and immutable with no exposed members to be changed by any consuming code. (TConfig is also immutable)
I also dont think i need a lock, if 2 threads try to set the reference at the same time, last one wins, which i am happy with.
And i dont know enough to understand if i need volatile. But from what i understood about it, i wont need it, as there is only 1 reference if parser that i care about, and its never exposed outside this class.
But i think some of my statements/assumptions above could be wrong.
EDIT:
From your comments I can deduce that you do not understand the difference between immutable and thread safety.
Immutability means an instance of an object can not be mutated (it's internal or external state can not change).
Thread safe means multiple threads can access the class/method without causing errors like race conditions, deadlocks or unexpected behavior like something which has to be executed only once is executed twice.
Immutable objects are thread safe, but something doesn't have to be immutable to be thread safe.
Your Tenants class is neither immutable nor thread safe because:
It's internal sate can change after instantiation.
It contains unexpected behavior where the request to receive the config is executed twice, where it should only happen once.
If you read my answer below you can determine that if you are ok with the request happening twice (which you shouldn't be): You don't have to do anything, but you could add the volatile keyword to the parser field to prevent SOME scenarios, but definitely not all.
You don't see any locks in immutable objects because there's no writing happening to the state of the object.
When there are writing operations in an object it is not immutable anymore (like your Tenants class). To make an object like that thread safe, you need to lock the write operations that can cause errors like the unexpected behavior of something which has to be executed only once is executed twice.
ConfigParser Seems to be thread safe, Tenants however definitely isn't.
Your Tenants class is also not immutable, since it exposes a method which changes the state of the class (both the GetSettings and Refresh methods).
If 2 threads call GetSettings at the same time when parser is null, 2 requests will be made to receive the ConfigParser. You can be OK with this, but it is bad practice, and also means the method is not thread safe.
If you are fine with the request being executed twice you could use volatile here:
The volatile keyword indicates that a field might be modified by multiple threads that are executing at the same time. The compiler, the runtime system, and even hardware may rearrange reads and writes to memory locations for performance reasons. Fields that are declared volatile are not subject to these optimizations. Adding the volatile modifier ensures that all threads will observe volatile writes performed by any other thread in the order in which they were performed.
Volatile will prevent threads from having outdated values. This means you could prevent some of the extra requests happening (from the threads which still think parser is null), but it will not completely prevent an method or instruction from being executed multiple times at the same time.
In this situation you need to lock:
The lock statement acquires the mutual-exclusion lock for a given object, executes a statement block, and then releases the lock. While a lock is held, the thread that holds the lock can again acquire and release the lock. Any other thread is blocked from acquiring the lock and waits until the lock is released.
Meaning you can prevent multiple threads from executing an method or instruction multiple times at the same time.
Unfortunately, you can't use await inside a lock.
What you want to do is:
If Refresh needs to be called:
If another thread is already working on the Refresh
Wait for the other thread to finish, and do not call Refresh
Continue with the result from the other thread
if no other thread is already working on the Refresh
Invoke the Refresh method
I have written a library for this called TaskSynchronizer. You can use that to accomplish a true thread safe version of you Tenants class.
Example:
public static TaskSynchronizer Synchronizer = new TaskSynchronizer();
public static async Task DoWork()
{
await Task.Delay(100); // Some heavy work.
Console.WriteLine("Work done!");
}
public static async Task WorkRequested()
{
using (Synchronizer.Acquire(DoWork, out var task)) // Synchronize the call to work.
{
await task;
}
}
static void Main(string[] args)
{
var tasks = new List<Task>();
for (var i = 0; i < 2; i++)
{
tasks.Add(WorkRequested());
}
Task.WaitAll(tasks.ToArray());
}
will output:
Work done!
EG: The async DoWork method has only be invoked once, even tho it has been invoked twice at the same time.

ConcurrentDictionary GetOrAdd async

I want to use something like GetOrAdd with a ConcurrentDictionary as a cache to a webservice. Is there an async version of this dictionary? GetOrAdd will be making a web request using HttpClient, so it would be nice if there was a version of this dictionary where GetOrAdd was async.
To clear up some confusion, the contents of the dictionary will be the response from a call to a webservice.
ConcurrentDictionary<string, Response> _cache
= new ConcurrentDictionary<string, Response>();
var response = _cache.GetOrAdd("id",
(x) => { _httpClient.GetAsync(x).GetAwaiter().GetResponse(); });
GetOrAdd won't become an asynchronous operation because accessing the value of a dictionary isn't a long running operation.
What you can do however is simply store tasks in the dictionary, rather than the materialized result. Anyone needing the results can then await that task.
However, you also need to ensure that the operation is only ever started once, and not multiple times. To ensure that some operation runs only once, and not multiple times, you also need to add in Lazy:
ConcurrentDictionary<string, Lazy<Task<Response>>> _cache = new ConcurrentDictionary<string, Lazy<Task<Response>>>();
var response = await _cache.GetOrAdd("id", url => new Lazy<Task<Response>>(_httpClient.GetAsync(url))).Value;
The GetOrAdd method is not that great to use for this purpose. Since it does not guarantee that the factory runs only once, the only purpose it has is a minor optimization (minor since additions are rare anyway) in that it doesn't need to hash and find the correct bucket twice (which would happen twice if you get and set with two separate calls).
I would suggest that you check the cache first, if you do not find the value in the cache, then enter some form of critical section (lock, semaphore, etc.), re-check the cache, if still missing then fetch the value and insert into the cache.
This ensures that your backing store is only hit once; even if multiple requests get a cache miss at the same time, only the first one will actually fetch the value, the other requests will await the semaphore and then return early since they re-check the cache in the critical section.
Psuedo code (using SemaphoreSlim with count of 1, since you can await it asynchronously):
async Task<TResult> GetAsync(TKey key)
{
// Try to fetch from catch
if (cache.TryGetValue(key, out var result)) return result;
// Get some resource lock here, for example use SemaphoreSlim
// which has async wait function:
await semaphore.WaitAsync();
try
{
// Try to fetch from cache again now that we have entered
// the critical section
if (cache.TryGetValue(key, out result)) return result;
// Fetch data from source (using your HttpClient or whatever),
// update your cache and return.
return cache[key] = await FetchFromSourceAsync(...);
}
finally
{
semaphore.Release();
}
}
Try this extension method:
/// <summary>
/// Adds a key/value pair to the <see cref="ConcurrentDictionary{TKey, TValue}"/> by using the specified function
/// if the key does not already exist. Returns the new value, or the existing value if the key exists.
/// </summary>
public static async Task<TResult> GetOrAddAsync<TKey,TResult>(
this ConcurrentDictionary<TKey,TResult> dict,
TKey key, Func<TKey,Task<TResult>> asyncValueFactory)
{
if (dict.TryGetValue(key, out TResult resultingValue))
{
return resultingValue;
}
var newValue = await asyncValueFactory(key);
return dict.GetOrAdd(key, newValue);
}
Instead of dict.GetOrAdd(key,key=>something(key)), you use await dict.GetOrAddAsync(key,async key=>await something(key)). Obviously, in this situation you just write it as await dict.GetOrAddAsync(key,something), but I wanted to make it clear.
In regards to concerns about preserving the order of operations, I have the following observations:
Using the normal GetOrAdd will get the same effect if you look at the way it is implemented. I literally used the same code and made it work for async. Reference says
the valueFactory delegate is called outside the locks to avoid the
problems that can arise from executing unknown code under a lock.
Therefore, GetOrAdd is not atomic with regards to all other operations
on the ConcurrentDictionary<TKey,TValue> class
SyncRoot is not supported in ConcurrentDictionary, they use an internal locking mechanism, so locking on it is not possible. Using your own lock mechanism works only for this extension method, though. If you use another flow (using GetOrAdd for example) you will face the same problem.
Probably using a dedicated memory cache with advanced asynchronous capabilities, like the LazyCache by Alastair Crabtree, would be preferable to using a simple ConcurrentDictionary<K,V>. You would get commonly needed functionality like time-based expiration, or automatic eviction of entries that are dependent on other entries that have expired, or are dependent on mutable external resources (like files, databases etc). These features are not trivial to implement manually.
Below is a custom extension method GetOrAddAsync for ConcurrentDictionarys that have Task<TValue> values. It accepts a factory method, and ensures that the method will be invoked at most once. It also ensures that failed tasks are removed from the dictionary.
/// <summary>
/// Returns an existing task from the concurrent dictionary, or adds a new task
/// using the specified asynchronous factory method. Concurrent invocations for
/// the same key are prevented, unless the task is removed before the completion
/// of the delegate. Failed tasks are evicted from the concurrent dictionary.
/// </summary>
public static Task<TValue> GetOrAddAsync<TKey, TValue>(
this ConcurrentDictionary<TKey, Task<TValue>> source, TKey key,
Func<TKey, Task<TValue>> valueFactory)
{
ArgumentNullException.ThrowIfNull(source);
ArgumentNullException.ThrowIfNull(valueFactory);
Task<TValue> currentTask;
if (source.TryGetValue(key, out currentTask))
return currentTask;
Task<Task<TValue>> newTaskTask = new(() => valueFactory(key));
Task<TValue> newTask = null;
newTask = newTaskTask.Unwrap().ContinueWith(task =>
{
if (!task.IsCompletedSuccessfully)
source.TryRemove(KeyValuePair.Create(key, newTask));
return task;
}, default, TaskContinuationOptions.DenyChildAttach |
TaskContinuationOptions.ExecuteSynchronously,
TaskScheduler.Default).Unwrap();
currentTask = source.GetOrAdd(key, newTask);
if (ReferenceEquals(currentTask, newTask))
newTaskTask.RunSynchronously(TaskScheduler.Default);
return currentTask;
}
This method is implemented using the Task constructor for creating a cold Task, that is started only if it is added successfully in the dictionary. Otherwise, if another thread wins the race to add the same key, the cold task is discarded. The advantage of using this technique over the simpler Lazy<Task> is that in case the valueFactory blocks the current thread, it won't block also other threads that are awaiting for the same key. The same technique can be used for implementing an AsyncLazy<T> or an AsyncExpiringLazy<T> class.
Usage example:
ConcurrentDictionary<string, Task<JsonDocument>> cache = new();
JsonDocument document = await cache.GetOrAddAsync("https://example.com", async url =>
{
string content = await _httpClient.GetStringAsync(url);
return JsonDocument.Parse(content);
});
Overload with synchronous valueFactory delegate:
public static Task<TValue> GetOrAddAsync<TKey, TValue>(
this ConcurrentDictionary<TKey, Task<TValue>> source, TKey key,
Func<TKey, TValue> valueFactory)
{
ArgumentNullException.ThrowIfNull(valueFactory);
return source.GetOrAddAsync(key, key => Task.FromResult<TValue>(valueFactory(key)));
}
Both overloads invoke the valueFactory delegate on the current thread.
If you have some reason to prefer invoking the delegate on the ThreadPool, you can just replace the RunSynchronously with the Start.
For a version of the GetOrAddAsync method that compiles on .NET versions older than .NET 6, you can look at the 3rd revision of this answer.
I solved this years ago before ConcurrentDictionary and the TPL was born. I'm in a café and don't have that original code but it went something like this.
It's not a rigorous answer but may inspire your own solution. The important thing is to return the value that was just added or exists already along with the boolean so you can fork execution.
The design lets you easily fork the race winning logic vs. the losing logic.
public bool TryAddValue(TKey key, TValue value, out TValue contains)
{
// guards etc.
while (true)
{
if (this.concurrentDic.TryAdd(key, value))
{
contains = value;
return true;
}
else if (this.concurrentDic.TryGetValue(key, out var existing))
{
contains = existing;
return false;
}
else
{
// Slipped down the rare path. The value was removed between the
// above checks. I think just keep trying because we must have
// been really unlucky.
// Note this spinning will cause adds to execute out of
// order since a very unlucky add on a fast moving collection
// could in theory be bumped again and again before getting
// lucky and getting its value added, or locating existing.
// A tiny random sleep might work. Experiment under load.
}
}
}
This could be made into an extension for ConcurrentDictionary or be a method on its own your own cache or something using locks.
Perhaps a GetOrAdd(K,V) could be used with an Object.ReferenceEquals() to check if it was added or not, instead of the spin design.
To be honest, the above code isn't the point of my answer. The power comes in the simple design of the method signature and how it affords the following:
static readonly ConcurrentDictionary<string, Task<Task<Thing>>> tasks = new();
//
var newTask = new Task<Task<Thing>>(() => GetThingAsync(thingId));
if (this.tasks.TryAddValue(thingId, newTask, out var task))
{
task.Start();
}
var thingTask = await task;
var thing = await thingTask;
It's a little quirky how a Task needs to hold a Task (if your work is async), and there's the allocations of unused Tasks to consider.
I think it's a shame Microsoft didn't ship its thread-safe collection with this method, or extract a "concurrent collection" interface.
My real implementation was a cache with sophisticated expiring inner collections and stuff. I guess you could subclass the .NET Task class and add a CreatedAt property to aid with eviction.
Disclaimer I've not tried this at all, it's off top of head, but I used this sort of design in an ultra-hi thru-put app in 2009.

Running Async Foreach Loop C# async await

I am struggling to grasp the basic concept of c# async await.
Basically what I have is a List of objects which I need to process, the processing involves iterating through its properties and joining strings, and then creating a new object (in this case called a trellocard) and eventually adding a list of trellocards.
The iteration takes quiet a long time, So what I would like to do is process multiple objects at asynchronously.
I've tried multiple approaches but basically I want to do something like this. (in the below example I have removed the processing, and just put system.threading.thread.sleep(200). Im await that this is NOT an async method, and I could use tasks.delay but the point is my processing does not have any async methods, i want to just run the entire method with multiple instances.
private async Task<List<TrelloCard>> ProcessJobs(IQueryable<IGrouping<CardGrouping, Job>> jobs)
{
List<TrelloCard> cards = new List<TrelloCard>();
foreach (var job in jobs.ToList())
{
card = await ProcessCards(job, cards); // I would like to run multiple instances of the processing
cards.add(card); //Once each instance is finshed it adds it to the list
}
private async Task<TrelloCard> ProcessCards(Job job)
{
System.Threading.Thread.Sleep(2000); //Just for examples sake
return new TrelloCard();
}
I am struggling to grasp the basic concept of c# async await.
Simple definition would be, Async-Await is a part .Net concurrency, which can be used to make multiple IO calls, and in process not waste the Threads, which are meant for Compute operations. Its like call to Database, Web service, Network calls, File IO, all of which doesn't need a current process thread
In your current case, where the use case is:
iterating through its properties and joining strings, and then creating a new object
eventually adding a list of trellocards
This seems to be a compute bound operation, until and unless you are doing an IO, to me it seems you are traversing an in memory object, for this case the better choice would be:
Parallel.ForEach, to parallelize the in memory processing, though you need to be careful of Race conditions, as a given memory could be accessed by multiple threads, thus corrupting it specially during write operation, so at least in current code use Thread safe collection like ConcurrentBag from System.Collections.Concurrent namespace, or which ever suit the use case instead of List<TrelloCard>, or you may consider following Thread safe list
Also please note that, in case your methods are not by default Async, then you may plan to wrap them in a Task.Run, to await upon, though this would need a Thread pool thread, but can be called using Async-Await
Parallel.Foreach code for your use case (I am doing direct replacement, there seems to be an issue in your code, since ProcessCards function, just takes Job object but you are also passing the collection Cards, which is compilation error):
private List<TrelloCard> ProcessJobs(IQueryable<IGrouping<CardGrouping, Job>> jobs)
{
ConcurrentBag<TrelloCard> cards = new ConcurrentBag<TrelloCard>();
Parallel.ForEach(jobs.ToList(), (job) =>
{
card = ProcessCards(job); // I would like to run multiple instances of the processing
cards.Add(card); //Once each instance is finshed it adds it to the list
});
return cards.ToList();
}
private TrelloCard ProcessCards(Job job)
{
return new TrelloCard();
}
If you want them to run in parallel you could spawn a new Task for each operation and then await the completion of all using Task.WhenAll.
private async Task<List<TrelloCard>> ProcessJobs(IQueryable<IGrouping<CardGrouping, Job>> jobs)
{
List<Task<TrelloCard>> tasks = new List<Task<TrelloCard>>();
foreach (var job in jobs)
{
tasks.Add(ProcessCards(job));
}
var results = await Task.WhenAll(tasks);
return results.ToList();
}
private Task<TrelloCard> ProcessCards(Job job)
{
return Task.Run(() =>
{
System.Threading.Thread.Sleep(2000); //Just for examples sake
return new TrelloCard();
});
}
jobs.ToList() is just wasting memory. It's already IEnumerable so can be used in a foreach.
ProcessCards doesn't compile. You need something like this
private Task<TrelloCard> ProcessCards(Job job)
{
return Task.Run(() =>
{
System.Threading.Thread.Sleep(2000); //Just for examples sake
return new TrelloCard();
});
}
Now you want ProcessJobs to
create a ProcessCards task for each job
wait for all tasks to finish
return a sequence of TrelloCard
private async Task<List<TrelloCard>> ProcessJobs(IQueryable<IGrouping<CardGrouping, Job>> jobs)
{
return await Task.WhenAll(jobs.Select(ProcessCards));
}

Lazy shared async resource — Clarification?

I saw this example at the end of Stephen's book.
This code can be accessed by more than one thread.
static int _simpleValue;
static readonly Lazy<Task<int>> MySharedAsyncInteger = new Lazy<Task<int>>(
async () =>
{
await Task.Delay(TimeSpan.FromSeconds(2)).ConfigureAwait(false);
return _simpleValue++;
});
async Task GetSharedIntegerAsync()
{
int sharedValue = await MySharedAsyncInteger.Value;
}
No matter how many parts of the code call Value simultaneously, the
Task<int> is only created once and returned to all callers.
But then he says :
If there are different thread types that may call Value (e.g., a UI
thread and a thread-pool thread, or two different ASP.NET request
threads), then it may be better to always execute the asynchronous
delegate on a thread-pool thread.
So he suggests the following code which makes the whole code run in a threadpool thread :
static readonly Lazy<Task<int>> MySharedAsyncInteger = new Lazy<Task<int>>(() => Task.Run(
async () =>
{
await Task.Delay(TimeSpan.FromSeconds(2));
return _simpleValue++;;
}));
Question:
I don't understand what's the problem with the first code. The continuation would be executed in a threadpool thread (due to ConfigureAwait , we don't need the original context).
Also as soon that any control from any thread will reach the await , the control will be back to the caller.
I don't see what extra risk the second code is trying to resolve.
I mean - what is the problem with "different thread types that may call Value" in the first code?
what is the problem with "different thread types that may call Value"
in the first code?
There in nothing wrong with that code. But, imagine you had some CPU bound work along with the async initialization call. Picture it like this for example:
static readonly Lazy<Task<int>> MySharedAsyncInteger = new Lazy<Task<int>>(
async () =>
{
int i = 0;
while (i < 5)
{
Thread.Sleep(500);
i++;
}
await Task.Delay(TimeSpan.FromSeconds(2));
return 0;
});
Now, you aren't "guarded" against these kind of operations. I'm assuming Stephan mentioned the UI thread because you shouldn't be doing any operation that's longer than 50ms on it. You don't want your UI thread to freeze, ever.
When you use Task.Run to invoke the delegate, you're covering yourself from places where one might pass a long running delegate to your Lazy<T>.
Stephan Toub talks about this in AsyncLazy:
Here we have a new AsyncLazy<T> that derives from Lazy<Task<T>> and
provides two constructors. Each of the constructors takes a function
from the caller, just as does Lazy<T>. The first constructor, in
fact, takes the same Func that Lazy<T>. Instead of passing that
Func<T> directly down to the base constructor, however, we instead
pass down a new Func<Task<T>> which simply uses StartNew to run the
user-provided Func<T>. The second constructor is a bit more fancy.
Rather than taking a Func<T>, it takes a Func<Task<T>>. With this
function, we have two good options for how to deal with it. The first
is simply to pass the function straight down to the base constructor,
e.g:
public AsyncLazy(Func<Task<T>> taskFactory) : base(taskFactory) { }
That option works, but it means that when a user accesses the Value
property of this instance, the taskFactory delegate will be invoked
synchronously. That could be perfectly reasonable if the taskFactory
delegate does very little work before returning the task instance.
If, however, the taskFactory delegate does any non-negligable work, a
call to Value would block until the call to taskFactory completes. To
cover that case, the second approach is to run the taskFactory using
Task.Factory.StartNew, i.e. to run the delegate itself asynchronously,
just as with the first constructor, even though this delegate already
returns a Task<T>.

Categories