ReadOnlyCollections and Threads - Is this code safe?

ReadOnlyCollections and Threads - Is this code safe? - c#

I have a heavily threaded application with a ReadOnlyCollection as follows:
internal static ReadOnlyCollection<DistributorBackpressure44> DistributorBackpressure44Cache
{
get
{
return _distributorBackpressure44;
}
set
{
_distributorBackpressure44 = value;
}
}
I have one place in the app where this collection is replaced (always on a separate thread) and it looks like this:
CicApplication.DistributorBackpressure44Cache = new ReadOnlyCollection<DistributorBackpressure44>(someQueryResults.ToList());
I have many places in the code where this collection is accessed, usually via Linq queries, in many different threads. The code often looks something like this:
foreach (DistributorBackpressure44 distributorBackpressure44 in CicApplication.DistributorBackpressure44Cache.Where(row => row.Coater == coater && row.CoaterTime >= targetTime).ToList())
{
...
...
}
I assume what I'm doing is thread-safe, without the need to do any locking? What I'm not sure about is what happens with the query above if it occurs at the exact same time the collection is getting replaced in a different thread?

Reference assignments are atomic, so yes, it is thread safe. But only as long as you don't rely on the data to be ready to be read exactly the moment after it is written. This is because of caching, you might want to throw in a volatile to prevent that.
See also reference assignment is atomic so why is Interlocked.Exchange(ref Object, Object) needed?.

It is pretty unlikely. At an entirely unpredictable moment in time, after the thread assigns the property, other threads are going to see the new collection. They'll read a stale value before that. Which might be null or might be another collection with entirely different content.
The randomness is what will get you in trouble. Do note that this has nothing to do with whether or not the collection is ReadOnly. Maybe it is okay that the threads use a stale value, but that isn't very common. Most of all you didn't mention it was okay so you might not yet have considered the consequences. You will need to think this through. There isn't anything you can do about that yourself in the property getter and setter, the threads are going to have to negotiate between themselves. This is what makes threading hard and makes it very important that you thoroughly analyzes the places in the code where mutable data is shared. Like this property.

It sounds like it won't be a problem in your case, but if the underlying collection that the ReadOnlyCollection is wrapping changes, you could run into problems.
For example, the following code chunk will throw an InvalidOperationException with the message "Collection was modified; enumeration operation may not execute" since the underlying List<int> has an item removed from it while it is being enumerated over on another thread.
var numbers = new List<int>() {1,2,3,4,5};
var readOnly = new ReadOnlyCollection<int>(numbers);
ThreadPool.QueueUserWorkItem(x => {
foreach (int number in readOnly)
{
Console.WriteLine(number);
Thread.Sleep(300);
}
});
Thread.Sleep(150);
numbers.Remove(2);

Related

Is iterating over an array with a for loop a thread safe operation in C# ? What about iterating an IEnumerable<T> with a foreach loop?

Based on my understanding, given a C# array, the act of iterating over the array concurrently from multiple threads is a thread safe operation.
By iterating over the array I mean reading all the positions inside the array by means of a plain old for loop. Each thread is simply reading the content of a memory location inside the array, no one is writing anything so all the threads read the same thing in a consistent manner.
This is a piece of code doing what I wrote above:
public class UselessService
{
private static readonly string[] Names = new [] { "bob", "alice" };
public List<int> DoSomethingUseless()
{
var temp = new List<int>();
for (int i = 0; i < Names.Length; i++)
{
temp.Add(Names[i].Length * 2);
}
return temp;
}
}
So, my understanding is that the method DoSomethingUseless is thread safe and that there is no need to replace the string[] with a thread safe type (like ImmutableArray<string> for instance).
Am I correct ?
Now let's suppose that we have an instance of IEnumerable<T>. We don't know what the underlying object is, we just know that we have an object implementing IEnumerable<T>, so we are able to iterate over it by using the foreach loop.
Based on my understanding, in this scenario there is no guarantee that iterating over this object from multiple threads concurrently is a thread safe operation. Put another way, it is entirely possible that iterating over the IEnumerable<T> instance from different threads at the same time breaks the internal state of the object, so that it becomes corrupted.
Am I correct on this point ?
What about the IEnumerable<T> implementation of the Array class ? Is it thread safe ?
Put another way, is the following code thread safe ? (this is exactly the same code as above, but now the array is iterated by using a foreach loop instead of a for loop)
public class UselessService
{
private static readonly string[] Names = new [] { "bob", "alice" };
public List<int> DoSomethingUseless()
{
var temp = new List<int>();
foreach (var name in Names)
{
temp.Add(name.Length * 2);
}
return temp;
}
}
Is there any reference stating which IEnumerable<T> implementations in the .NET base class library are
actually thread safe ?

Is iterating over an array with a for loop a thread safe operation in C# ?
If you're strictly talking about reading from multiple threads, that will be thread safe for Array and List<T> and just about every collection written by Microsoft, regardless of if you're using a for or foreach loop. Especially in the example you have:
var temp = new List<int>();
foreach (var name in Names)
{
temp.Add(name.Length * 2);
}
You can do that across as many threads as you want. They'll all read the same values from Names happily.
If you write to it from another thread (this wasn't your question, but it's worth noting)
Iterating over an Array or List<T> with a for loop, it'll just keep reading, and it'll happily read the changed values as you come across them.
Iterating with a foreach loop, then it depends on the implementation. If a value in an Array changes part way through a foreach loop, it will just keep enumerating and give you the changed values.
With List<T>, it depends what you consider "thread safe". If you are more concerned with reading accurate data, then it kind of is "safe" since it will throw an exception mid-enumeration and tell you that the collection changed. But if you consider throwing an exception to be not safe, then it's not safe.
But it's worth noting that this is a design decision in List<T>, there is code that explicitly looks for changes and throws an exception. Design decisions brings us to the next point:
Can we assume that every collection that implements IEnumerable is safe to read across multiple threads?
In most cases it will be, but thread-safe reading is not guaranteed. The reason is because every IEnumerable requires an implementation of IEnumerator, which decides how to traverse the items in the collection. And just like any class, you can do anything you want in there, including non-thread-safe things like:
Using static variables
Using a shared cache for reading values
Not making any effort to handle cases where the collection changes mid-enumeration
etc.
You could even do something weird like make GetEnumerator() return the same instance of your enumerator every time its called. That could really make for some unpredictable results.
I consider something to not be thread safe if it can result in unpredictable results. Any of those things could cause unpredictable results.
You can see the source code for the Enumerator that List<T> uses, so you can see that it doesn't do any of that weird stuff, which tells you that enumerating List<T> from multiple threads is safe.

To assert that your code is thread-safe means that we must take your words for granted that there is no code inside the UselessService that will try to replace concurrently the contents of the Names array with something like "tom" and "jerry" or (more sinister) null and null. On the other hand using an ImmutableArray<string> would guarantee that the code is thread-safe, and everybody could be assured about that just by looking the type of the static readonly field, without having to inspect carefully the rest of the code.
You may find interesting these comments from the source code of the ImmutableArray<T>, regarding some implementation details of this struct:
A readonly array with O(1) indexable lookup time.
This type has a documented contract of being exactly one reference-type field in size. Our own System.Collections.Immutable.ImmutableInterlocked class depends on it, as well as others externally.
IMPORTANT NOTICE FOR MAINTAINERS AND REVIEWERS:
This type should be thread-safe. As a struct, it cannot protect its own fields from being changed from one thread while its members are executing on other threads because structs can change in place simply by reassigning the field containing this struct. Therefore it is extremely important that Every member should only dereference this ONCE. If a member needs to reference the array field, that counts as a dereference of this. Calling other instance members (properties or methods) also counts as dereferencing this. Any member that needs to use this more than once must instead assign this to a local variable and use that for the rest of the code instead. This effectively copies the one field in the struct to a local variable so that it is insulated from other threads.

What is the preferred method of updating a reference to an immutable object?

In case we have an immutable object like an ImmutableList(). What is the preferred method for using this object in a multi threaded environment?
Eg
public class MutableListOfObjects()
{
private volatile ImmutableList objList;
public class MutableListOfObjects()
{
objList = new ImmutableList();
}
void Add(object o)
{
// Adding a new object to a list will create a new list to ensure immutability of lists.
// Is declaring the object as volatile enough or do we want to
// use other threading concepts?
objList = objList.Add(o);
}
// Will objList always use that lest version of the list
bool Exist(object o)
{
return objList.Exist(o);
}
}
Is declaring the reference volatile sufficient for achieving the desired behavior? Or is it preferable to use other threading functions?

"Preferred" is contextual. The simplest approach is to use a lock, and in most cases that will do the job very effectively. If you have good reason to think that lock is a problem, then Interlocked is useful:
bool retry;
do {
var snapshot = objList;
var combined = snapshot.Add(o);
retry = Interlocked.CompareExchange(ref objList, combined, snapshot)
!= snapshot;
} while(retry);
This basically works on an optimistic but checked path: most times through, it'll only go through once. Occasionally somebody will change the value of objList while we aren't looking - that's fine, we just try again.
There are, however, pre-canned implementations of thread-safe lists etc, by people who really know what they are talking about. Consider using ConcurrentBag<T> etc. Or just a List<T> with a lock.

A simple and efficient approach is to use ImmutableInterlocked.Update. You pass it a method to perform the add. It calls your add method and then atomically assigns the new value to objList if the list didn't change during the add. If the list changed, Update calls your add method again to retry. It keeps retrying until it is able to write the change.
ImmutableInterlocked.Update(ref objList, l => l.Add(o));
If you have a lot of write contention, such that you'd spend too much time on retries, then using a lock on some stable object (not objList) is preferable.

volatile will not help you in this case - it will not create the lock between reading objList, calling Add() and assigning objList. You should use a locking mechanism instead. volatile just protects against operation reallocations.
In your case you are creating a new list every time an object is added - usually a much better alternative would be to create the list inside a local thread variable (so that it is not subject to multi-threading issues) and once the list is created, mark it as immutable or create a immutable wrapper for it. This way you will get much better performance and memory usage.

Is this use of the generic List thread safe

I have a System.Collections.Generic.List<T> to which I only ever add items in a timer callback. The timer is restarted only after the operation completes.
I have a System.Collections.Concurrent.ConcurrentQueue<T> which stores indices of added items in the list above. This store operation is also always performed in the same timer callback described above.
Is a read operation that iterates the queue and accesses the corresponding items in the list thread safe?
Sample code:
private List<Object> items;
private ConcurrentQueue<int> queue;
private Timer timer;
private void callback(object state)
{
int index = items.Count;
items.Add(new object());
if (true)//some condition here
queue.Enqueue(index);
timer.Change(TimeSpan.FromMilliseconds(500), TimeSpan.FromMilliseconds(-1));
}
//This can be called from any thread
public IEnumerable<object> AccessItems()
{
foreach (var index in queue)
{
yield return items[index];
}
}
My understanding:
Even if the list is resized when it is being indexed, I am only accessing an item that already exists, so it does not matter whether it is read from the old array or the new array. Hence this should be thread-safe.

Is a read operation that iterates the queue and accesses the corresponding items in the list thread safe?
Is it documented as being thread safe?
If no, then it is foolish to treat it as thread safe, even if it is in this implementation by accident. Thread safety should be by design.
Sharing memory across threads is a bad idea in the first place; if you don't do it then you don't have to ask whether the operation is thread safe.
If you have to do it then use a collection designed for shared memory access.
If you can't do that then use a lock. Locks are cheap if uncontended.
If you have a performance problem because your locks are contended all the time then fix that problem by changing your threading architecture rather than trying to do dangerous and foolish things like low-lock code. No one writes low-lock code correctly except for a handful of experts. (I am not one of them; I don't write low-lock code either.)
Even if the list is resized when it is being indexed, I am only accessing an item that already exists, so it does not matter whether it is read from the old array or the new array.
That's the wrong way to think about it. The right way to think about it is:
If the list is resized then the list's internal data structures are being mutated. It is possible that the internal data structure is mutated into an inconsistent form halfway through the mutation, that will be made consistent by the time the mutation is finished. Therefore my reader can see this inconsistent state from another thread, which makes the behaviour of my entire program unpredictable. It could crash, it could go into an infinite loop, it could corrupt other data structures, I don't know, because I'm running code that assumes a consistent state in a world with inconsistent state.

Big edit
The ConcurrentQueue is only safe with regard to the Enqueue(T) and T Dequeue() operations.
You're doing a foreach on it and that doesn't get synchronized at the required level.
The biggest problem in your particular case is the fact the enumerating of the Queue (which is a Collection in it's own right) might throw the wellknown "Collection has been modified" exception. Why is that the biggest problem ? Because you are adding things to the queue after you've added the corresponding objects to the list (there's also a great need for the List to be synchronized but that + the biggest problem get solved with just one "bullet"). While enumerating a collection it is not easy to swallow the fact that another thread is modifying it (even if on a microscopic level the modification is a safe - ConcurrentQueue does just that).
Therefore you absolutely need synchronize the access to the queues (and the central List while you're at it) using another means of synchronization (and by that I mean you can also forget abount ConcurrentQueue and use a simple Queue or even a List since you never Dequeue things).
So just do something like:
public void Writer(object toWrite) {
this.rwLock.EnterWriteLock();
try {
int tailIndex = this.list.Count;
this.list.Add(toWrite);
if (..condition1..)
this.queue1.Enqueue(tailIndex);
if (..condition2..)
this.queue2.Enqueue(tailIndex);
if (..condition3..)
this.queue3.Enqueue(tailIndex);
..etc..
} finally {
this.rwLock.ExitWriteLock();
}
}
and in the AccessItems:
public IEnumerable<object> AccessItems(int queueIndex) {
Queue<object> whichQueue = null;
switch (queueIndex) {
case 1: whichQueue = this.queue1; break;
case 2: whichQueue = this.queue2; break;
case 3: whichQueue = this.queue3; break;
..etc..
default: throw new NotSupportedException("Invalid queue disambiguating params");
}
List<object> results = new List<object>();
this.rwLock.EnterReadLock();
try {
foreach (var index in whichQueue)
results.Add(this.list[index]);
} finally {
this.rwLock.ExitReadLock();
}
return results;
}
And, based on my entire understanding of the cases in which your app accesses the List and the various Queues, it should be 100% safe.
End of big edit
First of all: What is this thing you call Thread-Safe ? by Eric Lippert
In your particular case, I guess the answer is no.
It is not the case that inconsistencies might arrise in the global context (the actual list).
Instead it is possible that the actual readers (who might very well "collide" with the unique writer) end up with inconsistencies in themselves (their very own Stacks meaning: local variables of all methods, parameters and also their logically isolated portion of the heap)).
The possibility of such "per-Thread" inconsistencies (the Nth thread wants to learn the number of elements in the List and finds out that value is 39404999 although in reality you only added 3 values) is enough to declare that, generally speaking that architecture is not thread-safe ( although you don't actually change the globally accessible List, simply by reading it in a flawed manner ).
I suggest you use the ReaderWriterLockSlim class.
I think you will find it fits your needs:
private ReaderWriterLockSlim rwLock = new ReaderWriterLockSlim(LockRecursionPolicy.SupportsRecursion);
private List<Object> items;
private ConcurrentQueue<int> queue;
private Timer timer;
private void callback(object state)
{
int index = items.Count;
this.rwLock.EnterWriteLock();
try {
// in this place, right here, there can be only ONE writer
// and while the writer is between EnterWriteLock and ExitWriteLock
// there can exist no readers in the following method (between EnterReadLock
// and ExitReadLock)
// we add the item to the List
// AND do the enqueue "atomically" (as loose a term as thread-safe)
items.Add(new object());
if (true)//some condition here
queue.Enqueue(index);
} finally {
this.rwLock.ExitWriteLock();
}
timer.Change(TimeSpan.FromMilliseconds(500), TimeSpan.FromMilliseconds(-1));
}
//This can be called from any thread
public IEnumerable<object> AccessItems()
{
List<object> results = new List<object>();
this.rwLock.EnterReadLock();
try {
// in this place there can exist a thousand readers
// (doing these actions right here, between EnterReadLock and ExitReadLock)
// all at the same time, but NO writers
foreach (var index in queue)
{
this.results.Add ( this.items[index] );
}
} finally {
this.rwLock.ExitReadLock();
}
return results; // or foreach yield return you like that more :)
}

No because you are reading and writing to/from the same object concurrently. This is not documented to be safe so you can't be sure it is safe. Don't do it.
The fact that it is in fact unsafe as of .NET 4.0 means nothing, btw. Even if it was safe according to Reflector it could change anytime. You can't rely on the current version to predict future versions.
Don't try to get away with tricks like this. Why not just do it in an obviously safe way?
As a side note: Two timer callbacks can execute at the same time, so your code is doubly broken (multiple writers). Don't try to pull off tricks with threads.

It is thread-safish. The foreach statement uses the ConcurrentQueue.GetEnumerator() method. Which promises:
The enumeration represents a moment-in-time snapshot of the contents of the queue. It does not reflect any updates to the collection after GetEnumerator was called. The enumerator is safe to use concurrently with reads from and writes to the queue.
Which is another way of saying that your program isn't going to blow up randomly with an inscrutable exception message like the kind you'll get when you use the Queue class. Beware of the consequences though, implicit in this guarantee is that you may well be looking at a stale version of the queue. Your loop will not be able to see any elements that were added by another thread after your loop started executing. That kind of magic doesn't exist and is impossible to implement in a consistent way. Whether or not that makes your program misbehave is something you will have to think about and can't be guessed from the question. It is pretty rare that you can completely ignore it.
Your usage of the List<> is however utterly unsafe.

Thread safe way of reading a value from a dictionary that may or may not exist

So, I've been spoiled by ConcurrentDictionary and it's awesome TryGetValue method. However, I'm constrained to using only regular Dictionary because this is in a portable class library targeting phone and other platforms. I'm trying to write a very limited subset of a Dictionary and exposing it in a thread-safe manner.
I basically need something like GetOrAdd from ConcurrentDictionary. Right now, I have this implemented like:
lock (lockdictionary)
{
if (!dictionary.ContainsKey(name))
{
value = new foo();
dictionary[name] = value;
}
value = dictionary[name];
}
Is this basically as good as I can get it? I think locking is only really required if the key doesn't exist and it gets added, however, there is no good "get value if it exists, return null otherwise" method. If I were to leave out the ContainsKey bit, when the key didn't exist I'd get an exception because the key doesn't exist.
Is there anyway I could get this to a more lean version? Or is this just the best a regular dictionary can do?

Locking is required even for reading in the presence of concurrent writers. So yes, this is as good as it gets if you mutate the dictionary.
You can of course always create a copy of the entire dictionary each time something is written. That way readers might see an out-of-date version but they can safely read.

You could try using ReaderWriterLockSlim. For example:
ReaderWriterLockSlim locker = new ReaderWriterLockSlim();
//..
public string GetOrAdd(string name)
{
locker.EnterUpgradeableReadLock();
try
{
if(!dictionary.ContainsKey(name))
{
locker.EnterWriteLock();
try
{
dictionary[name] = new foo();
}
finally
{
locker.ExitWriteLock();
}
}
value = dictionary[name];
}
finally
{
locker.ExitUpgradeableReadLock();
}
return value;
}

Your implementation is just fine. Note, that lock implementation has neglictable performance penalty in case of uncontended access. However, in order to achieve true thread-safety you must use lock with EVERY operation with dictionary - I suggest to write wrapper class, like SynchronizedDictinory to keep sync logic in one place

You can use a double-check pattern, as follows:
if (!dictionary.ContainsKey(name))
{
lock (lockdictionary)
{
if (!dictionary.ContainsKey(name))
{
value = new foo();
dictionary[name] = value;
}
value = dictionary[name];
}
}
This ensures you only lock if you actually need to, but also ensures once you have locked that you still need to add the value. The performance should be better than always locking. But don't take my word for it. Run a test!

This is as good as it gets.
Locking is required because dictionary makes no guarantees that you can update and read in parallel at all. Even single call to get element running at the same time as update on other thread may fail due to changes to internal data strucures.
Note that the behavior is explicitly covered in Thread Safety section of Dictionary
A Dictionary can support multiple readers concurrently, as long as the collection is not modified. Even so, enumerating through a collection is intrinsically not a thread-safe procedure. In the rare case where an enumeration contends with write accesses, the collection must be locked during the entire enumeration. To allow the collection to be accessed by multiple threads for reading and writing, you must implement your own synchronization.

Thread Synchronization in .NET

In my app I have a List of objects. I'm going to have a process (thread) running every few minutes that will update the values in this list. I'll have other processes (other threads) that will just read this data, and they may attempt to do so at the same time.
When the list is being updated, I don't want any other process to be able to read the data. However, I don't want the read-only processes to block each other when no updating is occurring. Finally, if a process is reading the data, the process that updates the data must wait until the process reading the data is finished.
What sort of locking should I implement to achieve this?

This is what you are looking for.
ReaderWriterLockSlim is a class that will handle scenario that you have asked for.
You have 2 pair of functions at your disposal:
EnterWriteLock and ExitWriteLock
EnterReadLock and ExitReadLock
The first one will wait, till all other locks are off, both read and write, so it will give you access like lock() would do.
The second one is compatible with each other, you can have multiple read locks at any given time.
Because there's no syntactic sugar like with lock() statement, make sure you will never forget to Exit lock, because of Exception or anything else. So use it in form like this:
try
{
lock.EnterWriteLock(); //ReadLock
//Your code here, which can possibly throw an exception.
}
finally
{
lock.ExitWriteLock(); //ReadLock
}

You don't make it clear whether the updates to the list will involve modification of existing objects, or adding/removing new ones - the answers in each case are different.
To handling modification of existing items in the list, each object should handle it's own locking.
To allow modification of the list while others are iterating it, don't allow people direct access to the list - force them to work with a read/only copy of the list, like this:
public class Example()
{
public IEnumerable<X> GetReadOnlySnapshot()
{
lock (padLock)
{
return new ReadOnlyCollection<X>( MasterList );
}
}
private object padLock = new object();
}
Using a ReadOnlyCollection<X> to wrap the master list ensures that readers can iterate through a list of fixed content, without blocking modifications made by writers.

You could use ReaderWriterLockSlim. It would satisfy your requirements precisely. However, it is likely to be slower than just using a plain old lock. The reason is because RWLS is ~2x slower than lock and accessing a List would be so fast that it would not be enough to overcome the additional overhead of the RWLS. Test both ways, but it is likely ReaderWriterLockSlim will be slower in your case. Reader writer locks do better in scenarios were the number readers significantly outnumbers the writers and when the guarded operations are long and drawn out.
However, let me present another options for you. One common pattern for dealing with this type of problem is to use two separate lists. One will serve as the official copy which can accept updates and the other will serve as the read-only copy. After you update the official copy you must clone it and swap out the reference for the read-only copy. This is elegant in that the readers require no blocking whatsoever. The reason why readers do not require any blocking type of synchronization is because we are treating the read-only copy as if it were immutable. Here is how it can be done.
public class Example
{
private readonly List<object> m_Official;
private volatile List<object> m_Readonly;
public Example()
{
m_Official = new List<object>();
m_Readonly = m_Official;
}
public void Update()
{
lock (m_Official)
{
// Modify the official copy here.
m_Official.Add(...);
m_Official.Remove(...);
// Now clone the official copy.
var clone = new List<object>(m_Official);
// And finally swap out the read-only copy reference.
m_Readonly = clone;
}
}
public object Read(int index)
{
// It is safe to access the read-only copy here because it is immutable.
// m_Readonly must be marked as volatile for this to work correctly.
return m_Readonly[index];
}
}
The code above would not satisfy your requirements precisely because readers never block...ever. Which means they will still be taking place while writers are updating the official list. But, in a lot of scenarios this winds up being acceptable.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.