Does `AsyncLocal` also do the things that `ThreadLocal` does? - c#

I'm struggling to find simple docs of what AsyncLocal<T> does.
I've written some tests which I think tell me that the answer is "yes", but it would great if someone could confirm that! (especially since I don't know how to write tests that would have definitive control of the threads and continuation contexts ... so it's possible that they only work coincidentally!)
As I understand it, ThreadLocal will guarantee that if you're on a different thread, then you'll get a different instance of an object.
If you're creating and ending threads, then you might end up re-using the thread again later (and thus arriving on a thread where "that thread's" ThreadLocal object has already been used a bit).
But the interaction with await is less pleasant. The thread that you continue on (even if .ConfigureAwait(true)) is not guaranteed to be the same thread you started on, thus you may not get the same object back out of ThreadLocal on the otherside.
Conversely, AsyncLocal does guarantee that you'll get the same object either side of an await call.
But I can't find anywhere that actually says that AsyncLocal will get a value that's specific to the initial thread, in the first place!
i.e.:
Suppose you have an instance method (MyAsyncMethod) that references a 'shared' AsyncLocal field (myAsyncLocal) from its class, on either side of an await call.
And suppose that you take an instance of that class and call that method in parallel a bunch of times. * And suppose finally that each invocation happens to end up scheduled on a distinct thread.
I know that for each separate invocation of MyAsyncMethod, myAsyncLocal.Value will return the same object before and after the await (assuming that nothing reassigns it)
But is it guaranteed that each of the invocations will be looking at different objects in the first place?
As mentioned at the start, I've created a test to try to determine this myself. The following test passes consistently
public class AssessBehaviourOfAsyncLocal
{
private class StringHolder
{
public string HeldString { get; set; }
}
[Test, Repeat(10)]
public void RunInParallel()
{
var reps = Enumerable.Range(1, 100).ToArray();
Parallel.ForEach(reps, index =>
{
var val = "Value " + index;
Assert.AreNotEqual(val, asyncLocalString.Value?.HeldString);
if (asyncLocalString.Value == null)
{
asyncLocalString.Value = new StringHolder();
}
asyncLocalString.Value.HeldString = val;
ExamineValuesOfLocalObjectsEitherSideOfAwait(val).Wait();
});
}
static readonly AsyncLocal<StringHolder> asyncLocalString = new AsyncLocal<StringHolder>();
static async Task ExamineValuesOfLocalObjectsEitherSideOfAwait(string expectedValue)
{
Assert.AreEqual(expectedValue, asyncLocalString.Value.HeldString);
await Task.Delay(100);
Assert.AreEqual(expectedValue, asyncLocalString.Value.HeldString);
}
}

But is it guaranteed that each of the invocations will be looking at different objects in the first place?
No. Think of it logically like a parameter (not ref or out) you pass to a function. Any changes (e.g. setting properties) to the object will be seen by the caller. But if you assign a new value - it won't be seen by the caller.
So in your code sample there are:
Context for the test
-> Context for each of the parallel foreach invocations (some may be "shared" between invocations since parallel will likely reuse threads)
-> Context for the ExamineValuesOfLocalObjectsEitherSideOfAwait invocation
I am not sure if context is the right word - but hopefully you get the right idea.
So the asynclocal will flow (just like a parameter to a function) from context for the test, down into context for each of the parallel foreach invocations etc etc. This is different to ThreadLocal (it won't flow it down like that).
Building on top of your example, have a play with:
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
using NUnit.Framework;
namespace NUnitTestProject1
{
public class AssessBehaviourOfAsyncLocal
{
public class Tester
{
public int Value { get; set; }
}
[Test, Repeat(50)]
public void RunInParallel()
{
var newObject = new object();
var reps = Enumerable.Range(1, 5);
Parallel.ForEach(reps, index =>
{
//Thread.Sleep(index * 50); (with or without this line,
Assert.AreEqual(null, asyncLocalString.Value);
asyncLocalObject.Value = newObject;
asyncLocalTester.Value = new Tester() { Value = 1 };
var backgroundTask = new Task(() => {
Assert.AreEqual(null, asyncLocalString.Value);
Assert.AreEqual(newObject, asyncLocalObject.Value);
asyncLocalString.Value = "Bobby";
asyncLocalObject.Value = "Hello";
asyncLocalTester.Value.Value = 4;
Assert.AreEqual("Bobby", asyncLocalString.Value);
Assert.AreNotEqual(newObject, asyncLocalObject.Value);
});
var val = "Value " + index;
asyncLocalString.Value = val;
Assert.AreEqual(newObject, asyncLocalObject.Value);
Assert.AreEqual(1, asyncLocalTester.Value.Value);
backgroundTask.Start();
backgroundTask.Wait();
// Note that the Bobby is not visible here
Assert.AreEqual(val, asyncLocalString.Value);
Assert.AreEqual(newObject, asyncLocalObject.Value);
Assert.AreEqual(4, asyncLocalTester.Value.Value);
ExamineValuesOfLocalObjectsEitherSideOfAwait(val).Wait();
});
}
static readonly AsyncLocal<string> asyncLocalString = new AsyncLocal<string>();
static readonly AsyncLocal<object> asyncLocalObject = new AsyncLocal<object>();
static readonly AsyncLocal<Tester> asyncLocalTester = new AsyncLocal<Tester>();
static async Task ExamineValuesOfLocalObjectsEitherSideOfAwait(string expectedValue)
{
Assert.AreEqual(expectedValue, asyncLocalString.Value);
await Task.Delay(100);
Assert.AreEqual(expectedValue, asyncLocalString.Value);
}
}
}
Notice how backgroundTask is able to see the same async local as the code that invoked it (even though it is from the other thread). It also doesn't impact the calling codes async local string or object - since it re-assigns to them. But the calling code can see its change to Tester (proving that the Task and its calling code share the same Tester instance).

Related

Use of Task.WhenAny with dictionary

I am currently writing a service which will create Task for each request he find as "Waiting for process" in the database.
The process can be long, and I want the service to check each time it loops if the task must be cancel, if it's the case I want to cancel the task with a token. So I need to store the ID of the request linked to the task.
I was thinking about having my static class with a dictionary as following:
public static Dictionary<Int32, Task<Int32>> _tasks = new Dictionary<int, Task<int>>();
Which I don't know if it's the better solution that exist but it still a working one I think.
Now I want to do a Task.WhenAny(..) to know when one of them is ended. The problem is that Task.WhenAny(..) accept an array, but not a Dictionary. I didn't see anything about passing a dictionary to a WhenAny and before I start working on the entire process which is very long I wanted to have a solution for each keypoint of my workflow. I could have get a list of the values of the dictionary but I would probably loose the id link. So I don't really know what to do ?
Is there a solution for that ? I don't wanted to recreate my "own" WhenAny and I don't even know if it's possible but I assume I can just parse the status of every row. However if it's the only option, I will.
I'm also open about the fact that storing the id of the request this way isn't a good way to do and in this case I'm open to any other suggestion.
EDIT : CODE ACORDING TO ANSWERS
I ended using this code which seems to be working. Now I'll test with more complicated tasks rather than just a file writing ! :)
public static class Worker
{
public static List<Task<Int32>> m_tasks = new List<Task<Int32>>();
public static Dictionary<Int32, CancellationTokenSource> m_cancellationTokenSources = new Dictionary<int, CancellationTokenSource>();
public static Int32 _testId = 1;
public static void run()
{
//Clean
Cleaner.CleanUploads();
Cleaner.CleanDownloads();
#region thread watching
if (m_tasks.Count > 0)
{
#region thread must be cancel
//Cancel thread
List<Task<Int32>> _removeTemp = new List<Task<Int32>>();
foreach (Task<Int32> _task in m_tasks)
{
if (DbWorker.mustBeCancel((Int32)_task.AsyncState))
{
m_cancellationTokenSources[(Int32)_task.AsyncState].Cancel();
//Cancellation actions
//task must be remove
_removeTemp.Add(_task);
}
}
foreach( Task<Int32> _taskToRemove in _removeTemp)
{
m_tasks.Remove(_taskToRemove);
}
#endregion
#region Conversion lookup
// Get conversion if any
// Create task
CancellationTokenSource _srcCancel = new CancellationTokenSource();
m_cancellationTokenSources.Add(_testId, _srcCancel);
m_tasks.Add(Task.Factory.StartNew(_testId => testRunner<Int32>((Int32)_testId), _testId, _srcCancel.Token));
_testId++;
// Attach task
#endregion
}
#endregion
else
{
CancellationTokenSource _srcCancel = new CancellationTokenSource();
m_cancellationTokenSources.Add(_testId, _srcCancel);
m_tasks.Add(Task.Factory.StartNew(_testId => testRunner<Int32>((Int32)_testId), _testId, _srcCancel.Token));
_testId++;
}
}
internal static void WaitAll()
{
Task.WaitAll(m_tasks.ToArray());
}
public static Int32 testRunner<T>(T _id)
{
for (Int32 i = 0; i <= 1000000; i++)
{
File.AppendAllText(#"C:\TestTemp\" + _id, i.ToString());
}
return 2;
}
}
Task.WhenAny return value is:
A task that represents the completion of one of the supplied tasks.
The return task's Result is the task that completed.
From the docs.
So basically you can pass to it the dictionary values and by awaiting it you will get the task that was finished, from here it is easy to attach to this task it's id using some LINQ:
var task = await Task.WhenAny(_tasks.Values);
var id = _tasks.Single(pair => pair.Value == task).Key;
There is a Task.WhenAny that takes an IEnumerable<Task>, and one that takes IEnumerable<Task<T>>, so you should just be able to use:
var winner = Task.WhenAny(theDictionary.Values);

Task.WhenAll() only executes 2 threads at a time?

In this problem I am trying to cache a single value, let's call it foo. If the value is not cached, then it takes while to retrieve.
My problem is not implementing it, but testing it.
In order to test it, I fire off 5 simultaneous tasks using Task.WhenAll() to get the cached value. The first one enters the lock and retrieves the value asynchronously, while the other 4 threads should wait on the lock. After waiting, they should one by one re-check the cached value, find that it has been retrieved by the first thread that cached it, and return it without a second retrieval.
[TestClass]
public class Class2
{
private readonly Semaphore semaphore = new Semaphore(1, 1);
private bool? foo;
private async Task<bool> GetFoo()
{
bool fooValue;
// Atomic operation to get current foo
bool? currentFoo = this.foo;
if (currentFoo.HasValue)
{
Console.WriteLine("Foo already retrieved");
fooValue = currentFoo.Value;
}
else
{
semaphore.WaitOne();
{
// Atomic operation to get current foo
currentFoo = this.foo;
if (currentFoo.HasValue)
{
// Foo was retrieved while waiting
Console.WriteLine("Foo retrieved while waiting");
fooValue = currentFoo.Value;
}
else
{
// Simulate waiting to get foo value
Console.WriteLine("Getting new foo");
await Task.Delay(TimeSpan.FromSeconds(5));
this.foo = true;
fooValue = true;
}
}
semaphore.Release();
}
return fooValue;
}
[TestMethod]
public async Task Test()
{
Task[] getFooTasks = new[] {
this.GetFoo(),
this.GetFoo(),
this.GetFoo(),
this.GetFoo(),
this.GetFoo(),
};
await Task.WhenAll(getFooTasks);
}
}
In my actual test and production code, I am retrieving the value through an interface and mocking that interface using Moq. At the end of the test I verify that the interface was only called 1 time (pass), rather than > 1 time (failure).
Output:
Getting new foo
Foo retrieved while waiting
Foo already retrieved
Foo already retrieved
Foo already retrieved
However you can see from the output of the test that it isn't as I expect. It looks as though only 2 of the threads executed concurrently, while the other threads waited until the first two were completed to even enter the GetFoo() method.
Why is this happening? Is it because I'm running it inside a VS unit test? Note that my test still passes, but not in the way I expect it to. I suspect there is some restriction on the number of threads in a VS unit test.
Task.WhenAll() doesn't start the tasks - it just waits for them.
Likewise, calling an async method doesn't actually force parallelization - it doesn't introduce a new thread, or anything like that. You only get new threads if:
You await something which hasn't completed, and your synchronization context schedules the continuation on a new thread (which it won't do in a WinForms context, for example; it'll just reuse the UI thread)
You explicitly use Task.Run, and the task scheduler creates a new thread to run it. (It may not need to, of course.)
You explicitly start a new thread.
To be honest, the use of blocking Semaphore methods in an async method feels very wrong to me. You don't seem to be really embracing the idea of asynchrony... I haven't tried to analyze exactly what your code is going to do, but I think you need to read up more on how async works, and how to best use it.
Your problem seems to lay with semaphore.WaitOne()
An async method will run synchronously until it hits its first await. In your code, the first await is only after the WaitOne is signaled. The fact that a method is async certainly does not mean it runs on multiple threads, it usually means the opposite.
Do get around this, use SemaphoreSlim.WaitAsync, that way the calling thread will yield control until the semaphore signals its done
public class Class2
{
private readonly SemaphoreSlim semaphore = new SemaphoreSlim(1, 1);
private bool? foo;
private async Task<bool> GetFoo()
{
bool fooValue;
// Atomic operation to get current foo
bool? currentFoo = this.foo;
if (currentFoo.HasValue)
{
Console.WriteLine("Foo already retrieved");
fooValue = currentFoo.Value;
}
else
{
await semaphore.WaitAsync();
{
// Atomic operation to get current foo
currentFoo = this.foo;
if (currentFoo.HasValue)
{
// Foo was retrieved while waiting
Console.WriteLine("Foo retrieved while waiting");
fooValue = currentFoo.Value;
}
else
{
// Simulate waiting to get foo value
Console.WriteLine("Getting new foo");
await Task.Delay(TimeSpan.FromSeconds(5));
this.foo = true;
fooValue = true;
}
}
semaphore.Release();
}
return fooValue;
}
await Task.Delay(TimeSpan.FromSeconds(5));
This should allow other tasks to run, but I suspect they are blocked on:
semaphore.WaitOne();
Mixing concurrency styles (in this case using Tasks and manual control with a synchronisation object) is very hard to get right.
(You seem to be trying to get the same value by having multiple concurrent tasks all pooling: this seems overkill.)
By default .NET will limit Task concurrency to the number of (logical) CPU Cores available, I suspect your system has two.

Unenviable duplication of code in C#

I have the following simple method in C#:
private static void ExtendTaskInternal<U>(
ref U task_to_update, U replace, Action a) where U : Task
{
var current = Interlocked.Exchange(ref task_to_update, replace);
if (current == null)
Task.Run(a);
else
current.AppendAction(a);
}
This is used for the following methods:
//A Task can only run once. But sometimes we wanted to have a reference to some
//tasks that can be restarted. Of cause, in this case "restart" a task means
//replace the reference with a new one. To safely do so we have to ensure a
//lot of things:
//
// * Would the referee be null?
// * Is it still running?
// * The replacement of the task must be atomic
//
//This method can help solving the above issues. If `task_to_update` is null,
//a new Task will be created to replace it. If it is already there, a new Task
//will be created as its continuation, which will only run when the previous
//one finishes.
//
//This is looks like a async mutex, since if you assume `ExtendTask` is the only
//function in your code that updates `task_to_update`, the delegates you pass to
//it runs sequentially. But the difference is that since you have a reference to
//a Task, you can attach continuations that receive notification of lock
//releases.
public static Task<T> ExtendTask<T>(ref Task<T> task_to_update, Func<T> func)
{
var next_ts = new TaskCompletionSource<T>();
ExtendTaskInternal(ref task_to_update, next_ts.Task,
() => next_ts.SetResult(func()));
return next_ts.Task;
}
If you want to do something but only after something else have already been done, this is useful.
Now, this version can only used to replace a Task<T>, not a Task since ref variables are invariant. So if you want it to work for Task as well you have to duplicate the code:
public static Task<T> ExtendTask<T>(ref Task task_to_update, Func<T> func)
{
var next_ts = new TaskCompletionSource<T>();
ExtendTaskInternal(ref task_to_update, next_ts.Task,
() => next_ts.SetResult(func()));
return next_ts.Task;
}
And so you can implement another version that works on Actions.
public static Task ExtendTask(ref Task task_to_update, Action a)
{
return ExtendTask(ref task_to_update, () =>
{
a();
return true;
});
}
So far so good. But I don't like the first and the second version of the ExtendTask, since the body looks exactly the same.
Are there any way to eliminate the duplication?
Background
People ask why not use ContinueWith.
First, notice that AppendAction is just a wrapper function (from Microsoft.VisualStudio.Threading) of ContinueWith so this code is already using it indirectly.
Second, What I did differently here is that I have a reference to update, so this is another wrapper function to ContinueWith, the purpose of those functions is to make it easier to use in some scenarios.
I provide the following concrete example (untested) to illustrate the usage of those methods.
public class Cat {
private Task miuTask = null;
//you have to finish a miu to start another...
private void DoMiu(){
//... do what ever required to "miu".
}
public Task MiuAsync(){
return MyTaskExtension.ExtendTask(ref miuTask, DoMiu);
}
public void RegisterMiuListener(Action whenMiued){
var current = miuTask;
if(current==null) current = TplExtensions.CompletedTask();
current.AppendAction(whenMiued);
}
}

Async Producer/Consumer

I have a instance of a class that is accessed from several threads. This class take this calls and add a tuple into a database. I need this to be done in a serial manner, as due to some db constraints, parallel threads could result in an inconsistent database.
As I am new to parallelism and concurrency in C#, I did this:
private BlockingCollection<Task> _tasks = new BlockingCollection<Task>();
public void AddDData(string info)
{
Task t = new Task(() => { InsertDataIntoBase(info); });
_tasks.Add(t);
}
private void InsertWorker()
{
Task.Factory.StartNew(() =>
{
while (!_tasks.IsCompleted)
{
Task t;
if (_tasks.TryTake(out t))
{
t.Start();
t.Wait();
}
}
});
}
The AddDData is the one who is called by multiple threads and InsertDataIntoBase is a very simple insert that should take few milliseconds.
The problem is that, for some reason that my lack of knowledge doesn't allow me to figure out, sometimes a task is been called twice! It always goes like this:
T1
T2
T3
T1 <- PK error.
T4
...
Did I understand .Take() completely wrong, am I missing something or my producer/ consumer implementation is really bad?
Best Regards,
Rafael
UPDATE:
As suggested, I made a quick sandbox test implementation with this architecture and as I was suspecting, it does not guarantee that a task will not be fired before the previous one finishes.
So the question remains: how to properly queue tasks and fire them sequentially?
UPDATE 2:
I simplified the code:
private BlockingCollection<Data> _tasks = new BlockingCollection<Data>();
public void AddDData(Data info)
{
_tasks.Add(info);
}
private void InsertWorker()
{
Task.Factory.StartNew(() =>
{
while (!_tasks.IsCompleted)
{
Data info;
if (_tasks.TryTake(out info))
{
InsertIntoDB(info);
}
}
});
}
Note that I got rid of Tasks as I'm relying on synced InsertIntoDB call (as it is inside a loop), but still no luck... The generation is fine and I'm absolutely sure that only unique instances are going to the queue. But no matter I try, sometimes the same object is used twice.
I think this should work:
private static BlockingCollection<string> _itemsToProcess = new BlockingCollection<string>();
static void Main(string[] args)
{
InsertWorker();
GenerateItems(10, 1000);
_itemsToProcess.CompleteAdding();
}
private static void InsertWorker()
{
Task.Factory.StartNew(() =>
{
while (!_itemsToProcess.IsCompleted)
{
string t;
if (_itemsToProcess.TryTake(out t))
{
// Do whatever needs doing here
// Order should be guaranteed since BlockingCollection
// uses a ConcurrentQueue as a backing store by default.
// http://msdn.microsoft.com/en-us/library/dd287184.aspx#remarksToggle
Console.WriteLine(t);
}
}
});
}
private static void GenerateItems(int count, int maxDelayInMs)
{
Random r = new Random();
string[] items = new string[count];
for (int i = 0; i < count; i++)
{
items[i] = i.ToString();
}
// Simulate many threads adding items to the collection
items
.AsParallel()
.WithDegreeOfParallelism(4)
.WithExecutionMode(ParallelExecutionMode.ForceParallelism)
.Select((x) =>
{
Thread.Sleep(r.Next(maxDelayInMs));
_itemsToProcess.Add(x);
return x;
}).ToList();
}
This does mean that the consumer is single threaded, but allows for multiple producer threads.
From your comment
"I simplified the code shown here, as the data is not a string"
I assume that info parameter passed into AddDData is a mutable reference type. Make sure that the caller is not using the same info instance for multple calls since that reference is captured in Task lambda .
Based on the trace that you provided the only logical possibility is that you have called InsertWorker twice (or more). There are thus two background threads waiting for items to appear in the collection and occasionally they both manage to grab an item and begin executing it.

Why this order of events in a lambda expression?

This is the code that I'm running:
[TestMethod]
public async Task HelloTest()
{
List<int> hello = new List<int>();
//await Task.WhenAll(Enumerable.Range(0, 2).Select(async x => await Say(hello)));
await Say(hello);
await Say(hello);
}
private static async Task Say(List<int> hello)
{
await Task.Delay(100);
var rep = new Random().Next();
hello.Add(rep);
}
Why is it that running this code, as it is, works as intended and results in two random numbers, but that using the commented code instead always results in two of the exact same number?
So you have several issues here.
First, why are you seeing the same value twice. That's the easy one. When you create a Random instance it is seeded with the current time, but the precision of the current time it uses is rather low. If you get two new Random instances within say 16 milliseconds or so (which is a really long time for a computer) you'll see the same values out of them. That's what's happening for you.
Normally the fix for that is just to share a single Random instance, but the problem there is that your random instances aren't being accessed from the same thread (potentially, assuming you don't have a SynchronizationContext specified), and Random isn't thread safe. You can use something like this to get your random numbers instead:
public static class MyRandom
{
private static object key = new object();
private static Random random = new Random();
public static int Next()
{
lock (key)
{
return random.Next();
}
}
//TODO add other methods for other `Random` methods as needed
}
Use that and it will resolve the immediate issue.
The other problem that you have, although it doesn't seem to be biting you currently, is that you're modifying your List from two different tasks, possibly being executed in different threads. You shouldn't do that. It's bad enough practice to have methods like this in a single threaded environment (as you're relying on side effects to do your work) but in a multitheraded environment this is very problematic, for more than just conceptual reasons. Instead you should have each thread return a value, and then pull all of those values into a collection on the caller's side, like so:
public async Task HelloTest()
{
var data = await Task.WhenAll(Say(), Say());
}
private static async Task<int> Say()
{
await Task.Delay(100);
return MyRandom.Next();
}
As to why the two Say calls are run in parallel, rather than sequentially, that has to do with the fact that in your second code snippet you aren't actually waiting for one task to complete before starting the next.
The method that you pass to Select is the method to spin up the task, and it won't block until that task is done before starting the next. The code that you have here:
await Task.WhenAll(Enumerable.Range(0, 2).Select(async x => await Say(hello)));
Is no different than simply having:
await Task.WhenAll(Enumerable.Range(0, 2).Select(x => Say(hello)));
Having an async method that does nothing but await one method call is really no different than just having that one method call. What's happening here is that Select is calling Say, staring the task, continuing on, which stats the next task, and then WhenAll is waiting (asynchronously) for both tasks to finish before continuing on.
Note: This is an answer to the original question (the question has since been changed).
They operate identically. I ran the below Console application and got the results 0 1 for both versions:
class Program
{
static int m_nextNumber = 0;
static void Main(string[] args)
{
var t1 = Version1();
m_nextNumber = 0;
var t2 = Version2();
Task.WaitAll(t1, t2);
Console.ReadKey();
}
static async Task Version1()
{
List<int> hello = new List<int>();
await Say(hello);
await Say(hello);
PrintHello(hello);
}
static async Task Version2()
{
List<int> hello = new List<int>();
await Task.WhenAll(Enumerable.Range(0, 2).Select(async x => await Say(hello)));
PrintHello(hello);
}
static void PrintHello(List<int> hello)
{
foreach (var i in hello)
Console.WriteLine(i);
}
static int GotANumber()
{
return m_nextNumber++;
}
static async Task Say(List<int> hello)
{
var rep = GotANumber();
hello.Add(rep);
}
}

Categories