How to check that AsyncLocal<T> is accessed within same "async context" - c#

TL;DR ThreadLocal<T>.Value points to same location if Thread.CurrentThread stays the same. Is there anything similar for AsyncLocal<T>.Value (e.g. would SychronizationContext.Current or ExecutionContext.Capture() suffice for all scenarios)?
Imagine we have created some snapshot of data structure which is kept in thread-local storage (e.g. ThreadLocal<T> instance) and passed it to axillary class for later use. This axillary class is used to restore this data structure to snapshot state. We don't want to restore this snapshot onto different thread, so we can check on which thread axillary class was created. For example:
class Storage<T>
{
private ThreadLocal<ImmutableStack<T>> stackHolder;
public IDisposable Push(T item)
{
var bookmark = new StorageBookmark<T>(this);
stackHolder.Value = stackHolder.Value.Push(item);
return bookmark;
}
private class StorageBookmark<TInner> :IDisposable
{
private Storage<TInner> owner;
private ImmutableStack<TInner> snapshot;
private Thread boundThread;
public StorageBookmark(Storage<TInner> owner)
{
this.owner = owner;
this.snapshot = owner.stackHolder.Value;
this.boundThread = Thread.CurrentThread;
}
public void Dispose()
{
if(Thread.CurrentThread != boundThread)
throw new InvalidOperationException ("Bookmark crossed thread boundary");
owner.stackHolder.Value = snapshot;
}
}
}
With this, we essentialy bound StorageBookmark to specific thread, and, therefore, to specific version of data structure in ThreadLocal storage. And we did that by assuring we don't cross "thread context" with the help of Thread.CurrentThread
Now, to question at hand. How can we achieve the same behavior with AsyncLocal<T> instead of ThreadLocal<T>? To be precise, is there anything similar to Thread.CurrentThread which can be checked at times of construction and usage to control that "async context" has not been crossed (That means AsyncLocal<T>.Value would point to same object as when bookmark was constructed).
It seems either SynchronizationContext.Current or ExecutionContext.Capture() may suffice, but I'm not sure which is better and that there is no catch (or even that would work in all possible situations)

What you're hoping to do is fundamentally contrary to the nature of asynchronous execution context; you're not required to (and therefore can't guarantee) that all Tasks created within your asynchronous context will be awaited immediately, in the same order they were created, or ever at all, but their creation within the scope of the calling context makes them part of the same asynchronous context, period.
It may be challenging to think of asynchronous execution context as different from thread contexts, but asynchrony is not synonymous with parallelism, which is specifically what logical threads support. Objects stored in Thread Local Storage that aren't intended to be shared/copied across threads can generally be mutable because execution within a logical thread will always be able to guarantee relatively constrained sequential logic (if some special treatment may be necessary to ensure compile-time optimizations don't mess with you, though this is rare and only necessary in very specific scenarios). For that reason the ThreadLocal in your example doesn't really need to be an ImmutableStack, it could just be a Stack (which has much better performance) since you don't need to worry about copy-on-write or concurrent access. If the stack were publicly accessible then it would be more concerning that someone could pass the stack to other threads which could push/pop items, but since it's a private implementation detail here the ImmutableStack could actually be seen as unnecessary complexity.
Anyway, Execution Context, which is not a unique concept to .NET (implementations on other platforms may differ in some ways, though in my experience never by much) is very much like (and directly related to) the call stack, but in a way that considers new asynchronous tasks to be new calls on the stack which may need to both share the caller's state as it was at the time the operation was executed, as well as diverge since the caller may continue to create more tasks and create/update state in ways that will not make logical sense when reading a sequential set of instructions. It is generally recommended that anything placed in the ExecutionContext be immutable, though in some cases all copies of the context still pointing to the same instance reference should necessarily share mutable data. IHttpContext, for instance, is stored on the default implementation of IHttpContextAccessor using AsyncLocal, so all tasks created in the scope of a single request will have access to the same response state, for example. Allowing multiple concurrent contexts to make mutations to the same reference instance necessarily introduces the possibility of issues, both from concurrency and logical order of execution. For instance, multiple tasks trying to set different results on an HTTP response will either result in an exception or unexpected behavior. You can try, to some extent, to help the consumer here, but at the end of the day it is the consumer's responsibility to understand the complexity of the nuanced implementation details they're dependent on (which is generally a code smell, but sometimes a necessary evil in real world situations).
That scenario aside, as said, for the sake of ensuring all nested contexts function predictably and safely it's generally recommended to only store immutable types and to always restore the context to its previous value (as you're doing with you disposable stack mechanism). The easiest way to think of the copy-on-write behavior is as though every single new task, new thread pool work item, and new thread gets its own clone of the context, but if they point to the same reference type (i.e. all have copies of the same reference pointer) they all have the same instance; the copy-on-write is simply an optimization that prevents copying when unnecessary, but can essentially be completely ignored and thought of as every logical task having its own copy of the context (which is very much like that ImmutableStack or a string). If the only way to update anything about the current value that the immutable collection item points to is to reassign it to a new modified instance then you never have to worry about cross-context pollution (just like that ImmutableStack you're using).
Your example doesn't show anything about how the data is accessed or what types are passed in for T so there's no way to see what issue you might face, but if what you're concerned about is nested tasks disposing the "Current" context or the IDisposable value being assigned to a field somewhere and accessed from a different thread, there's a few things you can try and some points worth considering:
The closest equivalent to your current check would be:
if(stackHolder.Value != this)
throw new InvalidOperationException ("Bookmark disposed out of order or in wrong context");
A simple ObjectDisposedException will throw an exception from at least one context if two contexts try to dispose it.
Though this generally isn't recommended, if you want to be absolutely certain the object was disposed at least once you could throw an exception in the finalizer of the IDisposable implementation (being sure to call GC.SuppressFinalize(this) in the Dispose method).
By combining the previous two, while it won't guarantee that it was disposed in the exact some task/method block that created it, you can at least guarantee that an object is disposed once and only once.
Due to the fundamental importance of the ways ExecutionContext is supposed to be flowed and controlled, it is the responsibility of the execution engine (typically the runtime, task scheduler, etc. but also in cases where a third party is using Tasks/Threads in novel ways) to ensure ExecutionContext flow is captured and suppressed where appropriate. If a thread, scheduler, or synchronization migration occurs in the root context, the ExecutionContext should not be flowed into the next logical task's ExecutionContext the thread/scheduler processes in the context where the task formerly executed. For example, if a task continuation starts on a ThreadPool thread and then awaits continuation that causes the next logical operations to continue on a different ThreadPool Thread than it was originally started on or some other I/O resource completion thread, when the original thread is returned the the ThreadPool it should not continue to reference/flow the ExecutionContext of the task which is no longer logically executing within it. Assuming no additional tasks are created in parallel and left astray, once execution resumes in the root awaiter it will be the only execution context that continues to have a reference to the context. When a Task completes, so does its execution context (or, rather, its copy of it).
Even if unobserved background tasks are started and never awaited, if the data stored in the AsyncLocal is immutable, the copy-on-write behavior combined with your immutable stack will ensure that parallel clones of execution contexts can never pollute each other
With the first check and used with immutable types, you really don't need to worry about cloned parallel execution contexts unless you're worried about them gaining access to sensitive data from previous contexts; when they Dispose the current item only the stack of the current execution context (e.g. the nested parallel context, specifically) reverts to the previous; all cloned contexts (including the parent) are not modified.
If you are worried about nested contexts accessing parent data by disposing things they shouldn't there are relatively simple patterns you can use to separate the IDisposable from the ambient value, as well as suppression patterns like those used in TransactionScope to, say, temporarily set the current value to null, etc.
Just to reiterate in a practical way, let's say, for instance, that you store an ImmutableList in one of your bookmarks. If the item stored in the ImmutableList is mutable then context pollution is possible.
var someImmutableListOfMutableItems = unsafeAsyncLocal.Value;
// sets Name for all contexts pointing to the reference type.
someImmutableListOfMutableItems[0].Name = "Jon"; // assigns property setter on shared reference of Person
// notice how unsafeAsyncLocal.Value never had to be reassigned?
Whereas an immutable collection of immutable items will never pollute another context unless something is super fundamentally wrong about how execution context is being flowed (contact a vendor, file a bug report, raise the alarm, etc.)
var someImmutableListOfImmutableItems = safeAsyncLocal.Value;
someImmutableListOfImmutableItems = someImmutableListOfImmutableItems.SetItem(0,
someImmutableListOfImmutableItems[0].SetName("Jon") // SetName returns a new immutable Person instance
); // SetItem returns new immutable list instance
// notice both the item and the collection need to be reassigned. No other context will be polluted here
safeAsyncLocal.Value = someImmutableListOfImmutableItems;
EDIT: Some articles for people who want to read something perhaps more coherent than my ramblings here :)
https://devblogs.microsoft.com/pfxteam/executioncontext-vs-synchronizationcontext/
https://weblogs.asp.net/dixin/understanding-c-sharp-async-await-3-runtime-context
And for some comparison, here's an article about how context is managed in JavaScript, which is single threaded but supports an asynchronous programming model (which I figure might help to illustrate how they relate/differ):
https://blog.bitsrc.io/understanding-execution-context-and-execution-stack-in-javascript-1c9ea8642dd0

The logical call context has the same flow semantics as the execution context, and therefore as AsyncLocal. Knowing that, you can store a value in the logical context to detect when you cross "async context" boundaries:
class Storage<T>
{
private AsyncLocal<ImmutableStack<T>> stackHolder = new AsyncLocal<ImmutableStack<T>>();
public IDisposable Push(T item)
{
var bookmark = new StorageBookmark<T>(this);
stackHolder.Value = (stackHolder.Value ?? ImmutableStack<T>.Empty).Push(item);
return bookmark;
}
private class StorageBookmark<TInner> : IDisposable
{
private Storage<TInner> owner;
private ImmutableStack<TInner> snapshot;
private Thread boundThread;
private readonly object id;
public StorageBookmark(Storage<TInner> owner)
{
id = new object();
this.owner = owner;
this.snapshot = owner.stackHolder.Value;
CallContext.LogicalSetData("AsyncStorage", id);
}
public void Dispose()
{
if (CallContext.LogicalGetData("AsyncStorage") != id)
throw new InvalidOperationException("Bookmark crossed async context boundary");
owner.stackHolder.Value = snapshot;
}
}
}
public class Program
{
static void Main()
{
DoesNotThrow().Wait();
Throws().Wait();
}
static async Task DoesNotThrow()
{
var storage = new Storage<string>();
using (storage.Push("hello"))
{
await Task.Yield();
}
}
static async Task Throws()
{
var storage = new Storage<string>();
var disposable = storage.Push("hello");
using (ExecutionContext.SuppressFlow())
{
Task.Run(() => { disposable.Dispose(); }).Wait();
}
}
}

Related

Is it possible to modify an object passed like a parameter to other thread with C#? [duplicate]

is there any way in c# to put objects in another thread? All I found is how to actually execute some methods in another thread. What I actually want to do is to instanciate an object in a new thread for later use of the methods it provides.
Hope you can help me,
Russo
Objects do not really belong to a thread. If you have a reference to an object, you can access it from many threads.
This can give problems with object that are not designed to be accessed from many threads, like (almost all) System.Windows.Forms classes, and access to COM objects.
If you only want to access an object from the same thread, store a reference to the thread in the object (or a wrapping object), and execute the methods via that thread.
There seems to be some confusion about how threads work here, so this is a primer (very short too, so you should find more material before venturing further into multi-threaded programming.)
Objects and memory are inherently multi-thread in the sense that all threads in a process can access them as they choose.
So objects do not have anything to do with threads.
However, code executes in a thread, and it is the thread the code executes in that you're probably after.
Unfortunately there is no way to just "put an object into a different thread" as you put it, you need to specifically start a thread and specify what code to execute in that thread. Objects used by that code can thus be "said" to belong to that thread, though that is an artificial limit you impose yourself.
So there is no way to do this:
SomeObject obj = new SomeObject();
obj.PutInThread(thatOtherThread);
obj.Method(); // this now executes in that other thread
In fact, a common trap many new multi-thread programmers fall into is that if they create an object in one thread, and call methods on it from another thread, all those methods execute in the thread that created the object. This is incorrect, methods always executes in the thread that called them.
So the following is also incorrect:
Thread 1:
SomeObject obj = new SomeObject();
Thread 2:
obj.Method(); // executes in Thread 1
The method here will execute in Thread 2. The only way to get the method to execute in the original thread is to cooperate with the original thread and "ask it" to execute that method. How you do that depends on the situation and there's many many ways to do this.
So to summarize what you want: You want to create a new thread, and execute code in that thread.
To do that, look at the Thread class of .NET.
But be warned: Multi-threaded applications are exceedingly hard to get correct, I would not add multi-threaded capabilities to a program unless:
That is the only way to get more performance out of it
And, you know what you're doing
All threads of a process share the same data (ignoring thread local storage) so there is no need to explicitly migrate objects between threads.
internal sealed class Foo
{
private Object bar = null;
private void CreateBarOnNewThread()
{
var thread = new Thread(this.CreateBar);
thread.Start();
// Do other stuff while the new thread
// creates our bar.
Console.WriteLine("Doing crazy stuff.");
// Wait for the other thread to finish.
thread.Join();
// Use this.bar here...
}
private void CreateBar()
{
// Creating a bar takes a long time.
Thread.Sleep(1000);
this.bar = new Object();
}
}
All threads can see the stack heap, so if the thread has a reference to the objects you need (passed in through a method, for example) then the thread can use those objects. This is why you have to be very careful accessing objects when multi-threading, as two threads might try and change the object at the same time.
There is a ThreadLocal<T> class in .NET that you can use to restrict variables to a specific thread: see http://msdn.microsoft.com/en-us/library/dd642243.aspx and http://www.c-sharpcorner.com/UploadFile/ddoedens/UseThreadLocals11212005053901AM/UseThreadLocals.aspx
Use ParameterizedThreadStart to pass an object to your thread.
"for later use of the methods it provides."
Using a class that contains method to execute on new thread and other data and methods, you can gain access from your thread to Data and methods from the new thread.
But ... if your execute a method from the class, you are executing on current thread.
To execute the method on the new thread needs some Thread syncronization.
System.Windows.Forms.Control.BeginInvoke do it, the Control thread is waiting until a request arrives.
WaitHandle class can help you.
There's a lot of jargon around threading, But it boils down something pretty simple.
For a simple program, you have one point of execution flowing from point a to b, one line at a time. Programming 101, right?
Ok, for multithreading, You now have more then one point of execution in your program. So, point 1 can be in one part of your program, and point 2 can be someplace else.
It's all the same memory, data and code, but you have more then one thing happening at a time. So, you can think, what happens of both points enter a loop at the same time, what do you think would happen? So techniques were created to keep that kind of issue either from happening, or to speed up some kind of process. (counting a value vs. say, networking.)
That's all it really is. It can be tricky to manage, and and it's easy to get lost in the jargon and theory, but keep this in mind and it will be much simpler.
There are other exceptions to the rule as always, but this is the basics of it.
If the method that you run in a thread resides in a custom class you can have members of this class to hold the parameters.
public class Foo
{
object parameter1;
object parameter2;
public void ThreadMethod()
{
...
}
}
Sorry to duplicate some previous work, but the OP said
What I actually want to do is to instanciate an object in a new thread for later use of the methods it provides.
Let me interpret that as:
What I actually want to do is have a new thread instantiate an object so that later I can use that object's methods.
Pray correct me if I've missed the mark. Here's the example:
namespace silly
{
public static class Program
{
//declared volatile to make sure the object is in a consistent state
//between thread usages -- For thread safety.
public static volatile Object_w_Methods _method_provider = null;
static void Main(string[] args)
{
//right now, _method_provider is null.
System.Threading.Thread _creator_thread = new System.Threading.Thread(
new System.Threading.ThreadStart(Create_Object));
_creator_thread.Name = "Thread for creation of object";
_creator_thread.Start();
//here I can do other work while _method_provider is created.
System.Threading.Thread.Sleep(256);
_creator_thread.Join();
//by now, the other thread has created the _method_provider
//so we can use his methods in this thread, and any other thread!
System.Console.WriteLine("I got the name!! It is: `" +
_method_provider.Get_Name(1) + "'");
System.Console.WriteLine("Press any key to exit...");
System.Console.ReadKey(true);
}
static void Create_Object()
{
System.Threading.Thread.Sleep(512);
_method_provider = new Object_w_Methods();
}
}
public class Object_w_Methods
{
//Synchronize because it will probably be used by multiple threads,
//even though the current implementation is thread safe.
[System.Runtime.CompilerServices.MethodImpl(
System.Runtime.CompilerServices.MethodImplOptions.Synchronized)]
public string Get_Name(int id)
{
switch (id)
{
case 1:
return "one is the name";
case 2:
return "two is the one you want";
default:
return "supply the correct ID.";
}}}}
Just like to elaborate on a previous answer. To get back to the problem, objects and memory space are shared by all threads. So they are always shared, but I am assuming you want to do so safely and work with results created by another thread.
Firstly try one of the trusted C# patterns. Async Patterns
There are set patterns to work with, that do transmit basic messages and data between threads.
Usually the one threat completes after it computes the results!
Life threats: Nothing is fool proof when going asynchronous and sharing data on life threats.
So basically keep it as simple as possible if you do need to go this route and try follow known patterns.
So now I just like to elaborate why some of the known patters have a certain structure:
Eventargs: where you create a deepcopy of the objects before passing it. (It is not foolproof because certain references might still be shared . )
Passing results with basic types like int floats, etc, These can be created on a constructor and made immutable.
Atomic key words one these types, or create monitors etc.. Stick to one thread reads the other writes.
Assuming you have complex data you like to work with on two threads simultaneously a completely different ways to solve this , which I have not yet tested:
You could store results in database and let the other executable read it. ( There locks occur on a row level but you can try again or change the SQL code and at least you will get reported deadlocks that can be solved with good design, not just hanging software!!) I would only do this if it actually makes sense to store the data in a database for other reasons.
Another way that helps is to program F# . There objects and all types are immutable by default/ So your objects you want to share should have a constructor and no methods allow the object to get changed or basic types to get incremented.
So you create them and then they don't change! So they are non mutable after that.
Makes locking them and working with them in parallel so much easier. Don't go crazy with this in C# classes because others might follow this "convention' and most things like Lists were just not designed to be immutable in C# ( readonly is not the same as immutable, const is but it is very limiting). Immutable versus readonly

C# lock based on class property

I've seen many examples of the lock usage, and it's usually something like this:
private static readonly object obj = new object();
lock (obj)
{
// code here
}
Is it possible to lock based on a property of a class? I didn't want to lock globally for any calls to the method with the lock statement, I'd like to lock only if the object passed as argument had the same property value as another object which was being processed prior to that.
Is that possible? Does that make sense at all?
This is what I had in mind:
public class GmailController : Controller
{
private static readonly ConcurrentQueue<PushRequest> queue = new ConcurrentQueue<PushRequest>();
[HttpPost]
public IActionResult ProcessPushNotification(PushRequest push)
{
var existingPush = queue.FirstOrDefault(q => q.Matches(push));
if (existingPush == null)
{
queue.Enqueue(push);
existingPush = push;
}
try
{
// lock if there is an existing push in the
// queue that matches the requested one
lock (existingPush)
{
// process the push notification
}
}
finally
{
queue.TryDequeue(out existingPush);
}
}
}
Background: I have an API where I receive push notifications from Gmail's API when our users send/receive emails. However, if someone sends a message to two users at the same time, I get two push notifications. My first idea was querying the database before inserting (based on subject, sender, etc). In some rare cases, the query of the second call is made before the SaveChanges of the previous call, so I end up having duplicates.
I know that if I ever wanted to scale out, lock would become useless. I also know I could just create a job to check recent entries and eliminate duplicates, but I was trying something different. Any suggestions are welcome.
Let me first make sure I understand the proposal. The problem given is that we have some resource shared to multiple threads, call it database, and it admits two operations: Read(Context) and Write(Context). The proposal is to have lock granularity based on a property of the context. That is:
void MyRead(Context c)
{
lock(c.P) { database.Read(c); }
}
void MyWrite(Context c)
{
lock(c.P) { database.Write(c); }
}
So now if we have a call to MyRead where the context property has value X, and a call to MyWrite where the context property has value Y, and the two calls are racing on two different threads, they are not serialized. However, if we have, say, two calls to MyWrite and a call to MyRead, and in all of them the context property has value Z, those calls are serialized.
Is this possible? Yes. That doesn't make it a good idea. As implemented above, this is a bad idea and you shouldn't do it.
It is instructive to learn why it is a bad idea.
First, this simply fails if the property is a value type, like an integer. You might think, well, my context is an ID number, that's an integer, and I want to serialize all accesses to the database using ID number 123, and serialize all accesses using ID number 345, but not serialize those accesses with respect to each other. Locks only work on reference types, and boxing a value type always gives you a freshly allocated box, so the lock would never be contested even if the ids were the same. It would be completely broken.
Second, it fails badly if the property is a string. Locks are logically "compared" by reference, not by value. With boxed integers, you always get different references. With strings, you sometimes get different references! (Because of interning being applied inconsistently.) You could be in a situation where you are locking on "ABC" and sometimes another lock on "ABC" waits, and sometimes it does not!
But the fundamental rule that is broken is: you must never lock on an object unless that object has been specifically designed to be a lock object, and the same code which controls access to the locked resource controls access to the lock object.
The problem here is not "local" to the lock but rather global. Suppose your property is a Frob where Frob is a reference type. You don't know if any other code in your process is also locking on that same Frob, and therefore you don't know what lock ordering constraints are necessary to prevent deadlocks. Whether a program deadlocks or not is a global property of a program. Just like you can build a hollow house out of solid bricks, you can build a deadlocking program out of a collection of locks that are individually correct. By ensuring that every lock is only taken out on a private object that you control, you ensure that no one else is ever locking on one of your objects, and therefore the analysis of whether your program contains a deadlock becomes simpler.
Note that I said "simpler" and not "simple". It reduces it to almost impossible to get correct, from literally impossible to get correct.
So if you were hell bent on doing this, what would be the right way to do it?
The right way would be to implement a new service: a lock object provider. LockProvider<T> needs to be able to hash and compare for equality two Ts. The service it provides is: you tell it that you want a lock object for a particular value of T, and it gives you back the canonical lock object for that T. When you're done, you say you're done. The provider keeps a reference count of how many times it has handed out a lock object and how many times it got it back, and deletes it from its dictionary when the count goes to zero, so that we don't have a memory leak.
Obviously the lock provider needs to be threadsafe and needs to be extremely low contention, because it is a mechanism designed to prevent contention, so it had better not cause any! If this is the road you intend to go down, you need to get an expert on C# threading to design and implement this object. It is very easy to get this wrong. As I have noted in comments to your post, you are attempting to use a concurrent queue as a sort of poor lock provider and it is a mass of race condition bugs.
This is some of the hardest code to get correct in all of .NET programming. I have been a .NET programmer for almost 20 years and implemented parts of the compiler and I do not consider myself competent to get this stuff right. Seek the help of an actual expert.
Although I find Eric Lippert's answer fantastic and marked it as the correct one (and I won't change that), his thoughts made me think and I wanted to share an alternative solution I found to this problem (and I'd appreciate any feedbacks), even though I'm not going to use it as I ended up using Azure functions with my code (so this wouldn't make sense), and a cron job to detected and eliminate possible duplicates.
public class UserScopeLocker : IDisposable
{
private static readonly object _obj = new object();
private static ICollection<string> UserQueue = new HashSet<string>();
private readonly string _userId;
protected UserScopeLocker(string userId)
{
this._userId = userId;
}
public static UserScopeLocker Acquire(string userId)
{
while (true)
{
lock (_obj)
{
if (UserQueue.Contains(userId))
{
continue;
}
UserQueue.Add(userId);
return new UserScopeLocker(userId);
}
}
}
public void Dispose()
{
lock (_obj)
{
UserQueue.Remove(this._userId);
}
}
}
...then you would use it like this:
[HttpPost]
public IActionResult ProcessPushNotification(PushRequest push)
{
using(var scope = UserScopeLocker.Acquire(push.UserId))
{
// process the push notification
// two threads can't enter here for the same UserId
// the second one will be blocked until the first disposes
}
}
The idea is:
UserScopeLocker has a protected constructor, ensuring you call Acquire.
_obj is private static readonly, only the UserScopeLocker can lock this object.
_userId is a private readonly field, ensuring even its own class can't change its value.
lock is done when checking, adding and removing, so two threads can't compete on these actions.
Possible flaws I detected:
Since UserScopeLocker relies on IDisposable to release some UserId, I can't guarantee the caller will properly use using statement (or manually dispose the scope object).
I can't guarantee the scope won't be used in a recursive function (thus possibly causing a deadlock).
I can't guarantee the code inside the using statement won't call another function which also tries to acquire a scope to the user (this would also cause a deadlock).

Should thread-safe class have a memory barrier at the end of its constructor?

When implementing a class intended to be thread-safe, should I include a memory barrier at the end of its constructor, in order to ensure that any internal structures have completed being initialized before they can be accessed? Or is it the responsibility of the consumer to insert the memory barrier before making the instance available to other threads?
Simplified question:
Is there a race hazard in the code below that could give erroneous behaviour due to the lack of a memory barrier between the initialization and the access of the thread-safe class? Or should the thread-safe class itself protect against this?
ConcurrentQueue<int> queue = null;
Parallel.Invoke(
() => queue = new ConcurrentQueue<int>(),
() => queue?.Enqueue(5));
Note that it is acceptable for the program to enqueue nothing, as would happen if the second delegate executes before the first. (The null-conditional operator ?. protects against a NullReferenceException here.) However, it should not be acceptable for the program to throw an IndexOutOfRangeException, NullReferenceException, enqueue 5 multiple times, get stuck in an infinite loop, or do any of the other weird things caused by race hazards on internal structures.
Elaborated question:
Concretely, imagine that I were implementing a simple thread-safe wrapper for a queue. (I'm aware that .NET already provides ConcurrentQueue<T>; this is just an example.) I could write:
public class ThreadSafeQueue<T>
{
private readonly Queue<T> _queue;
public ThreadSafeQueue()
{
_queue = new Queue<T>();
// Thread.MemoryBarrier(); // Is this line required?
}
public void Enqueue(T item)
{
lock (_queue)
{
_queue.Enqueue(item);
}
}
public bool TryDequeue(out T item)
{
lock (_queue)
{
if (_queue.Count == 0)
{
item = default(T);
return false;
}
item = _queue.Dequeue();
return true;
}
}
}
This implementation is thread-safe, once initialized. However, if the initialization itself is raced by another consumer thread, then race hazards could arise, whereby the latter thread would access the instance before the internal Queue<T> has been initialized. As a contrived example:
ThreadSafeQueue<int> queue = null;
Parallel.For(0, 10000, i =>
{
if (i == 0)
queue = new ThreadSafeQueue<int>();
else if (i % 2 == 0)
queue?.Enqueue(i);
else
{
int item = -1;
if (queue?.TryDequeue(out item) == true)
Console.WriteLine(item);
}
});
It is acceptable for the code above to miss some numbers; however, without the memory barrier, it could also be getting a NullReferenceException (or some other weird result) due to the internal Queue<T> not having been initialized by the time that Enqueue or TryDequeue are called.
Is it the responsibility of the thread-safe class to include a memory barrier at the end of its constructor, or is it the consumer who should include a memory barrier between the class's instantiation and its visibility to other threads? What is the convention in the .NET Framework for classes marked as thread-safe?
Edit: This is an advanced threading topic, so I understand the confusion in some of the comments. An instance can appear as half-baked if accessed from other threads without proper synchronization. This topic is discussed extensively within the context of double-checked locking, which is broken under the ECMA CLI specification without the use of memory barriers (such as through volatile). Per Jon Skeet:
The Java memory model doesn't ensure that the constructor completes before the reference to the new object is assigned to instance. The Java memory model underwent a reworking for version 1.5, but double-check locking is still broken after this without a volatile variable (as in C#).
Without any memory barriers, it's broken in the ECMA CLI specification too. It's possible that under the .NET 2.0 memory model (which is stronger than the ECMA spec) it's safe, but I'd rather not rely on those stronger semantics, especially if there's any doubt as to the safety.
Lazy<T> is a very good choice for Thread-Safe Initialization. I think it should be left to the consumer to provide that:
var queue = new Lazy<ThreadSafeQueue<int>>(() => new ThreadSafeQueue<int>());
Parallel.For(0, 10000, i =>
{
else if (i % 2 == 0)
queue.Value.Enqueue(i);
else
{
int item = -1;
if (queue.Value.TryDequeue(out item) == true)
Console.WriteLine(item);
}
});
Should thread-safe class have a memory barrier at the end of its
constructor?
I do not see a reason for this. The queue is local variable that is assigned from one thread and accessed from another. Such concurrent access should be synchronized and it is responsibility of the accessing code to do so. It has nothing to do with constructor or type of the variable, such access should always be explicitly synchronized or you are entering a dangerous area even for primitive types (even if the assignment is atomic, you may get caught is some cache trap). If the access to the variable is properly synchronized, it does not need any support in the constructor.
I'll attempt to answer this interesting and well-presented question, based on the comments by Servy and Douglas, and on information coming from other related questions. What follows is just my assumptions, and not solid information from a reputable source.
Thread-safe classes have properties and methods that can be safely invoked by multiple threads concurrently, but their constructors are not thread-safe. This means that it is entirely possible for a thread to "see" an instance of a thread-safe class having an invalid state, provided that the instance is constructed concurrently by another thread.
Adding the line Thread.MemoryBarrier(); at the end of the constructor is not enough to make the constructor thread-safe, because this statement only affects the thread that runs the constructor¹. The other threads that may access concurrently the under-construction instance are not affected. Memory-visibility is cooperative, and one thread cannot change what another thread "sees" by altering the other thread's execution flow (or invalidating the local cache of the CPU-core that the other thread is running on) in a non-cooperative manner.
The correct and robust way to ensure that all threads are seeing the instance having a valid state, is to include proper memory barriers in all threads. This can be achieved by either declaring the instance as volatile, in case it is a field of a class, or otherwise using the methods of the static Volatile class:
ThreadSafeQueue<int> queue = null;
Parallel.For(0, 10000, i =>
{
if (i == 0)
Volatile.Write(ref queue, new ThreadSafeQueue<int>());
else if (i % 2 == 0)
Volatile.Read(ref queue)?.Enqueue(i);
else
{
int item = -1;
if (Volatile.Read(ref queue)?.TryDequeue(out item) == true)
Console.WriteLine(item);
}
});
In this particular example it would be simpler and more efficient to instantiate the queue variable before invoking the Parallel.For method. Doing so would render unnecessary the explicit Volatile invocations. The Parallel.For method is using Tasks internally, and TPL includes the appropriate memory barriers at the beginning/end of each task. Memory barriers are generated implicitly and automatically by the .NET infrastructure, by any built-in mechanism that starts a thread or causes a delegate to execute on another thread. (citation)
I'll repeat that I'm not 100% confident about the correctness of the information presented above.
¹ Quoting from the documentation of the Thread.MemoryBarrier method: Synchronizes memory access as follows: The processor executing the current thread cannot reorder instructions in such a way that memory accesses prior to the call to MemoryBarrier() execute after memory accesses that follow the call to MemoryBarrier().
No, you don't need memory barrier in the constructor. Your assumption, even though demonstrating some creative thought - is wrong. No thread can get a half backed instance of queue. The new reference is "visible" to the other threads only when the initialization is done. Suppose thread_1 is the first thread to initialize queue - it goes through the ctor code, but queue's reference in the main stack is still null! only when thread_1 exists the constructor code it assigns the reference.
See comments below and OP elaborated question.

HttpContext.Current.Session null after making method async

I had a method like below
book.Bindbook();
I made it async as follow
new Task(book.Bindbook).Start();
Now this method uses HttpContext.Current.Session which is now returning null.
Here is code that returns null
public static Bookmanager CartManager
{
//Gets the value from the session variable.
get
{
try
{
if (HttpContext.Current.Session["BookData"] == null)
{
Bookmanager bookmgr= new Bookmanager ();
Book book = new Book(SessionManager.CurrentUser);
bookmgr.SetCurrentCart(book);
HttpContext.Current.Session["BookData"] = bookmgr;
}
else if (((Bookmanager)HttpContext.Current.Session["BookData"]).GetCurrentCart() == null)
{
Book book = new Book(SessionManager.CurrentUser);
((Bookmanager)HttpContext.Current.Session["BookData"]).SetCurrentCart(book);
}
}
catch(Exception ex)
{
//throw ex;
}
return ((Bookmanager)HttpContext.Current.Session["BookData"]);
}
//Sets the value of the session variable.
set
{
HttpContext.Current.Session["BookData"] = value;
}
}
There's a lot of potential problems with your solution which have lead to this problem. I'll try to break it down into pieces to explain what's going on.
new Task(book.Bindbook).Start() doesn't always run where you think it does
This method of creating an asynchronous operation is subtly dangerous as it's not easy to know how the task will be executed. When you call this constructor, the Task will capture the TaskScheduler.Current value as the mechanism it will use to schedule it's own execution. This means that your task's execution is invisibly tied to the context it's in.
Typically, you want to use Task.Run(Action) instead of creating a new Task instance and then calling Start, as this always runs on the value of TaskScheduler.Default, which is usually the .NET thread pool and is generally what you want to do when running a background task.
HttpContext is not thread-safe
The HttpContext class was never intended to be called from multiple threads safely. It's Current value tied to the thread which is processing the request and is not available on other threads. You should not pass it to other threads. Generally-speaking you should reduce the surface-area of HttpContext in your applications to a bare minimum. It's nearly impossible to mock for testing purposes and has several subtle limitations (such as you are finding) which make it challenging to work with.
Instead, surface the Current value as early as possible in your code and keep a reference to the objects you actually need to work with (like the session).
Static properties are usually harmful
Having a static property on an object either means that there are exactly one of these things for the entirety of the AppDomain (such as TaskScheduler.Default) where they represent some cross-cutting concern that can be configured, or that there is some hidden context manipulating the value behind the scenes. The former case is rare, but can be acceptable in some cases, but the second is pretty harmful. HttpContext.Current is an example of a value that should not be static (and future version of ASP.NET do away with it entirely). It makes code hard to reason about, nearly impossible to test and introduces subtle bugs (like this one) which can't easily be dealt with.
Fundamentally, this is the biggest problem here and the root cause of your pain. If this property were exposed as an instance property and the instance was scoped to the request context, you would have none of your issues. Once you're working with an object whose lifetime is the same as your request, all your critical state becomes local and easy to reason about.
Use ConfigureAwait(true) to allow to continue on the original context.
var task = new Task(() => book.Bindbook()).ConfigureAwait(true);
task.Start();
HttpContext is bound to thread, that's why it is null.
I think better solution will be to pass all needed data through parameters to other thread and not sharing HttpContext.

How can a child thread notify a parent thread of its status/progress?

I have a service responsible for many tasks, one of which is to launch jobs (one at a time) on a separate thread (threadJob child), these jobs can take a fair amount of time and
have various phases to them which I need to report back.
Ever so often a calling application requests the status from the service (GetStatus), this means that somehow the service needs to know at what point the job (child thread) is
at, my hope was that at some milestones the child thread could somehow inform (SetStatus) the parent thread (service) of its status and the service could return that information
to the calling application.
For example - I was looking to do something like this:
class Service
{
private Thread threadJob;
private int JOB_STATUS;
public Service()
{
JOB_STATUS = "IDLE";
}
public void RunTask()
{
threadJob = new Thread(new ThreadStart(PerformWork));
threadJob.IsBackground = true;
threadJob.Start();
}
public void PerformWork()
{
SetStatus("STARTING");
// do some work //
SetStatus("PHASE I");
// do some work //
SetStatus("PHASE II");
// do some work //
SetStatus("PHASE III");
// do some work //
SetStatus("FINISHED");
}
private void SetStatus(int status)
{
JOB_STATUS = status;
}
public string GetStatus()
{
return JOB_STATUS;
}
};
So, when a job needs to be performed RunTask() is called and this launches the thread (threadJob). This will run and perform some steps (using SetStatus to set the new status at
various points) and finally finish. Now, there is also function GetStatus() which should return the STATUS whenever requested (from a calling application using IPC) - this status
should reflect the current status of the job running by threadJob.
So, my problem is simple enough...
How can threadJob (or more specifically PerformWork()) return to Service the change in status in a thread-safe manner (I assume my example above of SetStatus/GetStatus is
unsafe)? Do I need to use events? I assume I cannot simply change JOB_STATUS directly ... Should I use a LOCK (if so on what?)...
You may have already looked into this, but the BackgroundWorker class gives you a nice interface for running tasks on background threads, and provides events to hook into for notifications that progress has changed.
You could create an event in the Service class and then invoke it in a thread-safe manner. Pay very close attention to how I have implemented the SetStatus method.
class Service
{
public delegate void JobStatusChangeHandler(string status);
// Event add/remove auto implemented code is already thread-safe.
public event JobStatusChangeHandler JobStatusChange;
public void PerformWork()
{
SetStatus("STARTING");
// stuff
SetStatus("FINISHED");
}
private void SetStatus(string status)
{
JobStatusChangeHandler snapshot;
lock (this)
{
// Get a snapshot of the invocation list for the event handler.
snapshot = JobStatusChange;
}
// This is threadsafe because multicast delegates are immutable.
// If you did not extract the invocation list into a local variable then
// the event may have all delegates removed after the check for null which
// which would result in a NullReferenceException when you attempt to invoke
// it.
if (snapshot != null)
{
snapshot(status);
}
}
}
I'd have the child thread raise a 'statusupdate' event, passing a struct with the information necessary for the parent and have the parent subscribe to it when launching it.
You can use the Event-Based Async Pattern.
I would go with delegate/event from the thread to the caller. If caller was UI or somewhere on that lines, I would be nice to the message pump and use appropriate Invoke()s to serialize notifications with the UIs thread when required.
I once wrote an app that needed a marker showing the progress a thread was making. I just used a shared global variable between them. The parent would just read the value, and the thread would just update it. No need to synchronize as only the parent read it, and only the child wrote it atomically. As it happened the parent was redrawing things frequently enough anyhow that it didn't even need to be poked by the child when the child updated the variable. Sometimes the simplest possible way works well.
Your current code mixes strings and ints for JOB_STATUS, which can't work. I'm assuming strings here, but it doesn't really matter, as I'll explain.
You current implementation is thread safe in the sense that no memory corruption will occur, since all assignments to reference type fields are guaranteed to be atomic. The CLR demands this, otherwise you could potentially access unmanaged memory if you could somehow access partially updated references. Your processor gives you that atomicity for free, however.
So as long as you're using reference types like strings, you won't get any memory corruption. The same is true for primitives like ints (and smaller) and enums based on them. (Just avoid longs and bigger, and non-primitive value types such as nullable integers.)
But, that is not the end of the story: this implementation is not guaranteed to always represent the current state. The reason for this is that the thread that calls GetStatus might be looking at a stale copy of the JOB_STATUS field, because the assignment in SetState contains no so-called memory barrier. That is: the new value for JOB_STATUS need not be sent to your main RAM right away. There are several reasons why this can be delayed:
Writing to main RAM is inherently slow (relatively speaking), which is the reason your processor has all kinds of buffers and L-something caches in the first place, so the processor usually delays memory synchronization. Not for very long, but it will probably delay. This can be quite noticeable on multicore processors, as these usually have separate caches per core.
The JIT might have stored the value of JOB_STATUS in a register earlier on, as part of some optimization strategy. Again, registers are far more efficient to use than your main RAM. However, this does mean that it might not see changes early enough, as it's still looking at the old copy in the register. (We're not talking minutes here, but still.)
So, if you want to be 100% certain that each thread & processor core is immediately aware of the changed status, declare your field as volatile:
private volatile int JOB_STATUS;
Now, GetStatus/SetStatus, without any locking constructs, is truly thread safe, as volatile demands that the value is read from and written to main RAM immediately (or something 100% equivalent, if the processor can do that more efficiently).
Note that if you don't declare your field as volatile you must use synchronization primitives, such as lock, but generally speaking you need to use the synchronization primitives both Get and Set, otherwise you won't solve the problem that volatile fixes.
Mind you, as you're doing IPC calls to get the status, I'd wager that you won't ever actually be able to observe any difference between non-volatile and volatile, given the overhead of the IPC calls and the thread synchronizations undoubtedly performed behind the scenes.
For more information on volatile, see volatile (C#) on MSDN.

Categories