Does ConcurrentQueue.IsEmpty require a memory barrier? - c#

When looking into IsEmpty, I noticed this on MSDN:
However, as this collection is intended to be accessed concurrently, it may be the case that another thread will modify the collection after IsEmpty returns, thus invalidating the result.
Sure this is true, but does this also imply that ConcurrentQueue doesn't use a read barrier when checking if the queue is empty?
I want to have a piece of code that checks in another thread if the concurrent queue is empty. Something like this:
while (!queue.IsEmpty)
{
}
However.. if ConcurrentQueue doesn't use a read barrier, I'd say we need to add our own memory barrier to ensure we read the right data, like this:
Thread.MemoryBarrier();
while (!queue.IsEmpty)
{
Thread.MemoryBarrier();
}
(BTW: These are just a minimal example to illustrate the case, there's more code in reality).
Is my observation correct? Or does the ConcurrentQueue handle this and does the first implementation work? (e.g. what I would expect from 'Concurrent')?
And what about 'Count'? I can't find the answer on MSDN... Same story?

No, it's nothing to do with memory barriers. It's simply got to be the case that something else could nip in and add or remove something to or from the queue just after your test.
You shouldn't really use IsEmpty. Use TryDequeue instead. Or use a BlockingCollection.
You really don't want to start writing code that "locks" the queue while you mess around with IsEmpty or Count.
(I almost never use ConcurrentQueue nowadays, since BlockingCollection is so much better, although that does of course depend on what you're trying to do.)

Related

Is there Microsoft implemented thread safe collection with a similar working?

I remember I saw a class with 2 exciting properties. I couldn't remember the class name but I think it was a collection, I'm not sure. The first property was named Read*word* and the second property was called Write*word*. Sadly I don't remember. Let me try to explain why these 2 properties exist. The idea was when a write happens we allow the already running "Reads" to finish but new ones will need to wait. See below how one would use this class.
var list = new ThreadSafeList();
// This is how we should read the collection.
using(var read = list.ReadLock)
{
// If a write is happening we will wait until it finishes.
await read.WaitAsync();
// Do something with the list. If we are here in the execution no writing can happen, only reads.
}
// This is how we should change the collection.
using(var write = list.WriteLock)
{
await write.WaitAsync();
// Modify the collection, nobody reads it, so we are safe.
}
Do I remember right? Is there a collection that has similar properties? If not can somebody tell me how I should make a similar class is this an AutoResetEvent or a ManualResetEvent maybe a Sempahore? I don't know.
The idea was when a write happens we allow the already running "Reads" to finish but new ones will need to wait. See below how one would use this class.
Sounds like a reader/writer lock to me. Reader/writer locks can be used with collections, although the lock itself does not contain items.
There is a synchronous reader/writer lock in the BCL. There is no asynchronous version, though.
Usually, when people ask for a reader/writer lock, I ask them to carefully reflect whether they really need it. Just the fact that you have some code acting as a reader and other code acting as a writer does not imply you should have a reader/writer lock. RWLs are possibly one of the most misused synchronization primitives; a simple lock (or SemaphoreSlim for the asynchronous case) usually suffices just fine.
Is there a collection that has similar properties?
There are certainly reader/writer collections available. These work by containing a number of items and permit different parts of code to produce or consume items. Hence the common name "producer/consumer" collections.
The one I would recommend these days is System.Threading.Channels. TPL Dataflow is another option.
Note that you don't have to take any locks with Channels/Dataflow. Your producers just produce data items, and your consumers just consume them. All the locking is internal to these collections.
If not can somebody tell me how I should make a similar class
If, after careful reflection, you really do want an asynchronous reader/writer lock, you can make one like this.

When to lock a thread-safe collection in .net ? ( & when not to lock ? )

Ok, I have read Thread safe collections in .NET and Why lock Thread safe collections?.
The former question being java centered, doesn't answer my question and the answer to later question tells that I don't need to lock the collection because they are supposed to thread-safe. (which is what I thought)
Now coming to my question,
I lot of developers I see, (on github and in my organisation) have started using the new thread-safe collection. However, often they don'tremove the lock around read & write operations.
I don't understand this. Isn't a thread-safe collection ... well, thread-safe completely ?
What could be the implications involved in not locking a thread-safe collection ?
EDIT: PS: here's my case,
I have a lot of classes, and some of them have an attribute on them. Very often I need to check if a given type has that attribute or not (using reflection of course). This could be expensive on performance. So decided to create a cache using a ConcurrentDictionary<string,bool>. string being the typeName and bool specifying if it has the attribute. At First, the cache is empty, the plan was to keep on adding to it as and when required. I came across GetOrAdd() method of ConcurrentDictionary. And my question is about the same, if I should call this method without locking ?
The remarks on MSDN says:
If you call GetOrAdd simultaneously on different threads,
addValueFactory may be called multiple times, but its key/value pair
might not be added to the dictionary for every call.
You should not lock a thread safe collection, it exposes methods to update the collection that are already locked, use them as intended.
The thread safe collection may not match your needs for instance if you want to prevent modification while an enumerator is opened on the collection (the provided thread safe collections allow modifications). If that's the case you'd better use a regular collection and lock it everywhere. The internal locks of the thread safe collections aren't publicly available.
It's hard to answer about implication in not locking a thread-safe collection. You don't need to lock a thread-safe collection but you may have to lock your code that does multiple things. Hard to tell without seeing the code.
Yes the method is thread safe but it might call the AddValueFactory multiple times if you hit an Add for the same key at the same time. In the end only one of the values will be added, the others will be discarded. It might not be an issue... you'll have to check how often you may reach this situation but I think it's not common and you can live with the performance penalty in an edge case that may never occur.
You could also build your dictionnary in a static ctor or before you need it. This way, the dictionnary is filled once and you don't ever write to it. The dictionary is then read only and you don't need any lock neither a thread safe collection.
A method of a class typically changes the object from state A to state B. However, another thread may also change the state of the object during the execution of that method, potentially leaving the object in an instable state.
For instance, a list may want to check if its underlying data buffer is large enough before adding a new item:
void Add(object item)
{
int requiredSpace = Count + 1;
if (buffer.Length < requiredSpace)
{
// increase underlying buffer
}
buffer[Count] = item;
}
Now if a list has buffer space for only one more item, and two threads attempt to add an item at the same time, they may both decide that no additional buffer space is required, potentially causing an IndexOutOfRangeException on one of these threads.
Thread-safe classes ensure that this does not happen.
This does not mean that using a thread-safe class makes your code thread-safe:
int count = myConcurrentCollection.Count;
myCurrentCollection.Add(item);
count++;
if (myConcurrentCollection.Count != count)
{
// some other thread has added or removed an item
}
So although the collection is thread safe, you still need to consider thread-safety for your own code. The enumerator example Guillaume mentioned is a perfect example of where threading issues might occur.
In regards to your comment, the documentation for ConcurrentDictionary mentions:
All these operations are atomic and are thread-safe with regards to all other operations on the ConcurrentDictionary class. The only exceptions are the methods that accept a delegate, that is, AddOrUpdate and GetOrAdd. For modifications and write operations to the dictionary, ConcurrentDictionary uses fine-grained locking to ensure thread safety. (Read operations on the dictionary are performed in a lock-free manner.) However, delegates for these methods are called outside the locks to avoid the problems that can arise from executing unknown code under a lock. Therefore, the code executed by these delegates is not subject to the atomicity of the operation.
So yes these overloads (that take a delegate) are exceptions.

Is it safe to put TryDequeue in a while loop?

I have not used concurrent queue before.
Is it OK to use TryDequeue as below, in a while loop? Could this not get stuck forever?
var cq = new ConcurrentQueue<string>();
cq.Enqueue("test");
string retValue;
while(!cq.TryDequeue(out retValue))
{
// Maybe sleep?
}
//Do rest of code
It's safe in the sense that the loop won't actually end until there is an item it has pulled out, and that it will eventually end if the queue has an item to be taken out. If the queue is emptied by another thread and no more items are added then of course the loop will not end.
Beyond all of that, what you have is a busy loop. This should virtually always be avoided. Either you end up constantly polling the queue asking for more items, wasting CPU time and effort tin the process, or you end up sleeping and therefore not actually using the item in the queue as soon as it is added (and even then, still wasting some time/effort on context switches just to poll the queue).
What you should be doing instead, if you find yourself in the position of wanting to "wait until there is an item for me to take" is use a BlockingCollection. It is specifically designed to wrap various types of concurrent collections and block until there is an item available to take. It allows you to change your code to queue.Take() and have it be easier to write, semantically stating what you're doing, be clearly correct, noticeably more effective, and completely safe.
Yes it is safe as per the documentation, but it is not a recommended design.
It might get "Stuck forever" if the queue was empty at the first call TryDequeue, and if no other thread pushes data in the queue after that point (you could break the while after N attempts or after a timeout, though).
ConcurrentQueue offers an IsEmpty member to check if there are items in the Queue. It is much more efficient to check it than to loop over a TryDequeue call (particularly if the queue is generally empty)
What you might want to do is :
while(cq.IsEmpty())
{
// Maybe sleep / wait / ...
}
if(cq.TryDequeue(out retValue))
{
...
}
EDIT:
If this last call returns false: another of your threads dequeued the item. If you don't have other threads, this is safe, if you do, you should use while(TryDequeue)

Is it safe to call Add on SortedDictionary in one thread and get Item in another?

I'm working on making my SortedDictionary thread safe and the thing I'm not sure about is: is it safe to have a call to add to SortedDictionary in one thread, like this:
dictionary.Add(key, value);
and simply get an item from this dictionary in another thread, like this:
variable = dictionary[key];
There is no explicit enumeration in either of those places, so it looks safe, but it would be great to be sure about it.
No, it is not safe to read and write SortedDictionary<K,V> concurrently: adding an element to a sorted dictionary may involve re-balancing of the tree, which may cause the concurrent read operation to take a wrong turn while navigating to the element of interest.
In order to fix this problem you would need to either wrap an instance of SortedDictionary<K,V> in a class that performs explicit locking, or roll your own collection compatible with the interfaces implemented by SortedDictionary<K,V>.
No. Anything that modifies the tree is not thread safe at all. The trick is to fill up the SortedDictionary in one thread, then treat it as immutable and let multiple threads read from it. (You can do this with a SortedDictionary, as stated here. I mention this because there may be a collection/dictionary/map out there somewhere that is changed when it is read, so you should always check.)
If you need to modify it once it's released into the wild, you have a problem. You need to lock it to write to it, and all the readers need to respect that lock, which means they need to lock it too, which means the readers can no longer read it simultaneously. The best way around this is usually to create a whole new SortedDictionary, then, once the new one is immutable, replace the reference to the original with a reference to the new one. (You need a volatile reference to do this right.) The readers will switch dictionaries cleanly without a problem. And the old dictionary won't go away until the last reader has finished reading and released its reference.
(There are n-readers and 1-writer locks, but you want to avoid any locking at all.)
(And keep in mind the reference to the dictionary can change suddenly if you're enumerating. Use a local variable for this rather than refering to the (volatile) reference.)
Java has a ConcurrentSkipListMap, which allows any number of simultaneous reads and writes, but I don't think there's anything like it in .NET yet. And if there is, it's going to be slower for reads than an immutable SortedDictionary anyway.
No, because it is not documented to be safe. That is the real reason. Reasoning with implementation details is not as good because they are details that you cannot rely on.
No, it is not safe to do so. If you want to implement in multithreading than you should do this
private readonly object lockObject = new object();
lock (lockObject )
{
//your dictionary operation here.
}

Thread.VolatileRead Implementation

I'm looking at the implementation of the VolatileRead/VolatileWrite methods (using Reflector), and i'm puzzled by something.
This is the implementation for VolatileRead:
[MethodImpl(MethodImplOptions.NoInlining)]
public static int VolatileRead(ref int address)
{
int num = address;
MemoryBarrier();
return num;
}
How come the memory barrier is placed after reading the value of "address"? dosen't it supposed to be the opposite? (place before reading the value, so any pending writes to "address" will be completed by the time we make the actual read.
The same thing goes to VolatileWrite, where the memory barrier is place before the assignment of the value. Why is that?
Also, why does these methods have the NoInlining attribute? what could happen if they were inlined?
I thought that until recently. Volatile reads aren't what you think they are - they're not about guaranteeing that they get the most recent value; they're about making sure that no read which is later in the program code is moved to before this read. That's what the spec guarantees - and likewise for volatile writes, it guarantees that no earlier write is moved to after the volatile one.
You're not alone in suspecting this code, but Joe Duffy explains it better than I can :)
My answer to this is to give up on lock-free coding other than by using things like PFX which are designed to insulate me from it. The memory model is just too hard for me - I'll leave it to the experts, and stick with things that I know are safe.
One day I'll update my threading article to reflect this, but I think I need to be able to discuss it more sensibly first...
(I don't know about the no-inlining part, btw. I suspect that inlining could introduce some other optimizations which aren't meant to happen around volatile reads/writes, but I could easily be wrong...)
Maybe I am oversimplifying, but I think the explanations about reordering and cache coherency and so on give too much details.
So, why the MemoryBarrier comes after the actual read?
I will try to explain this with an example that uses object instead of int.
One may think the correct is:
Thread 1 creates the object (initializes its inner data).
Thread 1 then puts the object into a variable.
Then it "does a fence" and all threads see the new value.
Then, the read is something like this:
Thread 2 "does a fence".
Thread 2 reads the object instance.
Thread 2 is sure that it has all the inner data of that instance (as it started with a fence).
The biggest problem with this is:
Thread 1 creates the object and initializes it.
Thread 1 then puts the object into a variable.
Before the Thread flushes the cache, the CPU itself flushes part of the cache... it commits only the address of the variable (not the contents of that variable).
At that moment, Thread 2 had already flushed its cache. So it is going to read everything from the main memory.
So, it reads the variable (it is there).
Then it reads the content (it is not there).
Finally, after all this, the CPU 1 executes the Thread 1 that does the fence.
So, what happens with the volatile write and read?
The volatile write makes the contents of the object go to the memory immediately (starts by the fence), then they set the variable (with may not go immediatelly to the real memory).
Then, the volatile read will first clear the cache. Then it reads the field. If it receives a value when reading the field, it is certain that the contents pointed by that reference are really there.
By those little things, yes, it is possible that you do a VolatileWrite(1) and another thread still see the value of zero. But as soon other threads see the value of 1 (using a volatile read), all other items needed that may be referenced are already there. You can't really tell it as when reading the old value (0 or null) you may simple not progress considering that you don't still have everything that you need.
I already saw some discussions that, even if that flushes the caches twice, the right pattern will be:
MemoryBarrier - will flush other variables changed before this call
Write
MemoryBarrier - will guarantee that the write was flushed
The Read will then need the same:
MemoryBarrier
Read - Guarantees that we see the latest info... maybe one that was put AFTER our memory barrier.
As something may have appeared after our MemoryBarrier and was already read, we must put another MemoryBarrier to access the contents.
Those could be two Write-Fences or two Read-Fences if that existed in .Net.
I am not sure on everything I said... that is a "compilation" of many information I got and it really explains why the VolatileRead and VolatileWrite appear to be reversed, but it also guarantees that no invalid values are read when using them.

Categories