I would like to know if I need a lock for a situation of 2 threads, one reading and the other one writing to same Variable both.
For example: We have 2 Threads: A & B, Thread A reads variable x at time T and Thread B writes variable T at time T.
Should I may consider here some type of lock?
In my case I have the Main Thread an many other SubThreads. The Main Thread holds a List<myObj> and before starting any SubThread I create instances of myObj assigning it to the List<myObj> and passing myObj to the SubThread.
At determined moment the List has to be sorted depending on a value contained in myObj, and it can perfectly happen that the List element which is read by the Main Thread, simultaneously is written by a SubThread.
Please some suggestions.
YES
Don't even consider any alternative until you have a deep and thorough understanding of how multi-threading works. Obligatory link: http://www.albahari.com/threading/, at the very least. And unless you have very good reasons, not even then - especially with something as complex as a List.
Any time you're accessing any shared state, make sure that all the ways to share the state are synchronized.
It's possible to use lock-less synchronization, but that's a rather advanced subject, and prone to errors. If you're only updating a primitive value, Interlocked might be good enough.
However, don't forget the contracts of the objects you're working with - List's sort is only safe if the items don't change during the sorting. So before starting the sort, you need to ensure noöne is modifying anything that could change the ordering while the sort is underway.
Do you really need those sub-threads? Do they really need to update the list (/items) from a separate thread? Perhaps posting the change to the UI thread would work well enough, while avoiding those multi-threading issues?
Related
I remember I saw a class with 2 exciting properties. I couldn't remember the class name but I think it was a collection, I'm not sure. The first property was named Read*word* and the second property was called Write*word*. Sadly I don't remember. Let me try to explain why these 2 properties exist. The idea was when a write happens we allow the already running "Reads" to finish but new ones will need to wait. See below how one would use this class.
var list = new ThreadSafeList();
// This is how we should read the collection.
using(var read = list.ReadLock)
{
// If a write is happening we will wait until it finishes.
await read.WaitAsync();
// Do something with the list. If we are here in the execution no writing can happen, only reads.
}
// This is how we should change the collection.
using(var write = list.WriteLock)
{
await write.WaitAsync();
// Modify the collection, nobody reads it, so we are safe.
}
Do I remember right? Is there a collection that has similar properties? If not can somebody tell me how I should make a similar class is this an AutoResetEvent or a ManualResetEvent maybe a Sempahore? I don't know.
The idea was when a write happens we allow the already running "Reads" to finish but new ones will need to wait. See below how one would use this class.
Sounds like a reader/writer lock to me. Reader/writer locks can be used with collections, although the lock itself does not contain items.
There is a synchronous reader/writer lock in the BCL. There is no asynchronous version, though.
Usually, when people ask for a reader/writer lock, I ask them to carefully reflect whether they really need it. Just the fact that you have some code acting as a reader and other code acting as a writer does not imply you should have a reader/writer lock. RWLs are possibly one of the most misused synchronization primitives; a simple lock (or SemaphoreSlim for the asynchronous case) usually suffices just fine.
Is there a collection that has similar properties?
There are certainly reader/writer collections available. These work by containing a number of items and permit different parts of code to produce or consume items. Hence the common name "producer/consumer" collections.
The one I would recommend these days is System.Threading.Channels. TPL Dataflow is another option.
Note that you don't have to take any locks with Channels/Dataflow. Your producers just produce data items, and your consumers just consume them. All the locking is internal to these collections.
If not can somebody tell me how I should make a similar class
If, after careful reflection, you really do want an asynchronous reader/writer lock, you can make one like this.
I'm working on making my SortedDictionary thread safe and the thing I'm not sure about is: is it safe to have a call to add to SortedDictionary in one thread, like this:
dictionary.Add(key, value);
and simply get an item from this dictionary in another thread, like this:
variable = dictionary[key];
There is no explicit enumeration in either of those places, so it looks safe, but it would be great to be sure about it.
No, it is not safe to read and write SortedDictionary<K,V> concurrently: adding an element to a sorted dictionary may involve re-balancing of the tree, which may cause the concurrent read operation to take a wrong turn while navigating to the element of interest.
In order to fix this problem you would need to either wrap an instance of SortedDictionary<K,V> in a class that performs explicit locking, or roll your own collection compatible with the interfaces implemented by SortedDictionary<K,V>.
No. Anything that modifies the tree is not thread safe at all. The trick is to fill up the SortedDictionary in one thread, then treat it as immutable and let multiple threads read from it. (You can do this with a SortedDictionary, as stated here. I mention this because there may be a collection/dictionary/map out there somewhere that is changed when it is read, so you should always check.)
If you need to modify it once it's released into the wild, you have a problem. You need to lock it to write to it, and all the readers need to respect that lock, which means they need to lock it too, which means the readers can no longer read it simultaneously. The best way around this is usually to create a whole new SortedDictionary, then, once the new one is immutable, replace the reference to the original with a reference to the new one. (You need a volatile reference to do this right.) The readers will switch dictionaries cleanly without a problem. And the old dictionary won't go away until the last reader has finished reading and released its reference.
(There are n-readers and 1-writer locks, but you want to avoid any locking at all.)
(And keep in mind the reference to the dictionary can change suddenly if you're enumerating. Use a local variable for this rather than refering to the (volatile) reference.)
Java has a ConcurrentSkipListMap, which allows any number of simultaneous reads and writes, but I don't think there's anything like it in .NET yet. And if there is, it's going to be slower for reads than an immutable SortedDictionary anyway.
No, because it is not documented to be safe. That is the real reason. Reasoning with implementation details is not as good because they are details that you cannot rely on.
No, it is not safe to do so. If you want to implement in multithreading than you should do this
private readonly object lockObject = new object();
lock (lockObject )
{
//your dictionary operation here.
}
I've already read previous questions here about ConcurrentBag but did not find an actual sample of implementation in multi-threading.
ConcurrentBag is a thread-safe bag implementation, optimized for scenarios where the same thread will be both producing and consuming data stored in the bag."
Currently this is the current usage in my code (this is simplified not actual codes):
private void MyMethod()
{
List<Product> products = GetAllProducts(); // Get list of products
ConcurrentBag<Product> myBag = new ConcurrentBag<Product>();
//products were simply added here in the ConcurrentBag to simplify the code
//actual code process each product before adding in the bag
Parallel.ForEach(
products,
new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount },
product => myBag.Add(product));
ProcessBag(myBag); // method to process each items in the concurrentbag
}
My questions:
Is this the right usage of ConcurrentBag? Is it ok to use ConcurrentBag in this kind of scenario?
For me I think a simple List<Product> and a manual lock will do better. The reason for this is that the scenario above already breaks the "same thread will be both producing and consuming data stored in the bag" rule.
Also I also found out that the ThreadLocal storage created in each thread in the parallel will still exist after the operation (even if the thread is reused is this right?) which may cause an undesired memory leak.
Am I right in this one guys? Or a simple clear or empty method to remove the items in the ConcurrentBag is enough?
This looks like an ok use of ConcurrentBag. The thread local variables are members of the bag, and will become eligible for garbage collection at the same time the bag is (clearing the contents won't release them). You are right that a simple List with a lock would suffice for your case. If the work you are doing in the loop is at all significant, the type of thread synchronization won't matter much to the overall performance. In that case, you might be more comfortable using what you are familiar with.
Another option would be to use ParallelEnumerable.Select, which matches what you are trying to do more closely. Again, any performance difference you are going to see is likely going to be negligible and there's nothing wrong with sticking with what you know.
As always, if the performance of this is critical there's no substitute for trying it and measuring.
It seems to me that bmm6o's is not correct. The ConcurrentBag instance internally contains mini-bags for each thread that adds items to it, so item insertion does not involve any thread locks, and thus all Environment.ProcessorCount threads may get into full swing without being stuck waiting and without any thread context switches. A thread sinchronization may require when iterating over the collected items, but again in the original example the iteration is done by a single thread after all insertions are done. Moreover, if the ConcurrentBag uses Interlocked techniques as the first layer of the thread synchronization, then it is possible not to involve Monitor operations at all.
On the other hand, using a usual List<T> instance and wrapping each its Add() method call with a lock keyword will hurt the performance a lot. First, due to the constant Monitor.Enter() and Monitor.Exit() calls that each require to step deep into the kernel mode and to work with Windows synchronization primitives. Secondly, sometimes occasionally one thread may be blocked by the second thread because the second thread has not finished its addition yet.
As for me, the code above is a really good example of the right usage of ConcurrentBag class.
Is this the right usage of ConcurrentBag? Is it ok to use ConcurrentBag in this kind of scenario?
No, for multiple reasons:
This is not the intended usage scenario for this collection. The ConcurrentBag<T> is intended for mixed producer-consumer scenarios, meaning that each thread is expected to add and take items from the bag. Your scenario is nothing like this. You have many threads that add items, and zero threads that take items. The main application for the ConcurrentBag<T> is for making object-pools (pools of reusable objects that are expensive to create or destroy). And given the availability of the ObjectPool<T> class in the Microsoft.Extensions.ObjectPool package, even this niche application for this collection is contested.
It doesn't preserve the insertion order. Even if preserving the insertion order is not important, getting a shuffled output makes the debugging more difficult.
It creates garbage that have to be collected by the GC. It creates one WorkStealingQueue (internal class) per thread, each containing an expandable array, so the more threads you have the more objects you allocate. Also each time it is enumerated it copies all the items in an array, and exposes an IEnumerator<T> GetEnumerator() property that is boxed on each foreach.
There are better options available, offering both better performance and better ordering behavior.
In your scenario you can store the results of the parallel execution in a simple array. Just create an array with length equal to the products.Count, switch from the Parallel.ForEach to the Parallel.For, and assign the result directly to the corresponding slot of the results array without doing any synchronization at all:
List<Product> products = GetAllProducts(); // Get list of products
Product[] results = Product[products.Count];
Parallel.For(0, products.Count,
new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount },
i => results[i] = products[i]);
ProcessResults(results);
This way you'll get the results with perfect ordering, stored in a container that has the most compact size and the fastest enumeration of all .NET collections, doing only a single object allocation.
In case you are concerned about the thread-safety of the above operation, there is nothing to worry about. Each thread writes on different slots in the results array. After the completion of the parallel execution the current thread has full visibility of all the values that are stored in the array, because the TPL includes the appropriate barriers when tasks are queued, and at the beginning/end of task execution (citation).
(I have posted more thoughts about the ConcurrentBag<T> in this answer.)
If List<T> is used with a lock around Add() method it will make threads wait and will reduce the performance gain of using Parallel.ForEach()
If I have an array that can/will be accessed by multiple threads at any given point in time, what exactly causes it to be non-thread safe, and what would be the steps taken to ensure that the array would be thread safe in most situations?
I have looked extensively around on the internet and have found little to no information on this subject, everything seems to be specific scenarios (e.g. is this array, that is being accessed like this by these two threads thread-safe, and on, and on). I would really like of someone could either answer the questions I laid out at the top, or if someone could point towards a good document explaining said items.
EDIT:
After looking around on MSDN, I found the ArrayList class. When you use the synchronize method, it returns a thread-safe wrapper for a given list. When setting data in the list (i.e. list1[someNumber] = anotherNumber;) does the wrapper automatically take care of locking the list, or do you still need to lock it?
When two threads are accessing the exact same resource (e.g., not local copies, but actually the same copy of the same resource), a number of things can happen. In the most obvious scenario, if Thread #1 is accessing a resource and Thread #2 changes it mid-read, some unpredictable behavior can happen. Even with something as simple as an integer, you could have logic errors arise, so try to imagine the horrors that can result from improperly using something more complicated, like a database access class that's declared as static.
The classical way of handling this problem is to put a lock on the sensitive resources so only one thread can use it at a time. So in the above example, Thread #1 would request a lock to a resource and be granted it, then go in to read what it needs to read. Thread #2 would come along mid-read and request a lock to the resource, but be denied and told to wait because Thread #1 is using it. When Thread #1 finishes, it releases the lock and it's OK for Thread #2 to proceed.
There are other situations, but this illustrates one of the most basic problems and solutions. In C#, you may:
1) Use specific .NET objects that are managed as lockable by the framework (like Scorpion-Prince's link to SynchronizedCollection)
2) Use [MethodImpl(MethodImplOptions.Synchronized)] to dictate that a specific method that does something dangerous should only be used by one thread at a time
3) Use the lock statement to isolate specific lines of code that are doing something potentially dangerous
What approach is best is really up to your situation.
If I have an array that can/will be accessed by multiple threads at
any given point in time, what exactly causes it to be non-thread safe,
and what would be the steps taken to ensure that the array would be
thread safe in most situations?
In general terms, the fact that the array is not thread-safe is the notion that two or more threads could be modifying the contents of the array if you do not synchronize access to it.
Speaking generally, for example, let's suppose you have thread 1 doing this work:
for (int i = 0; i < array.Length; i++)
{
array[i] = "Hello";
}
And thread 2 doing this work (on the same shared array)
for (int i = 0; i < array.Length; i++)
{
array[i] = "Goodbye";
}
There isn't anything synchronizing the threads so your results will depend on which thread wins the race first. It could be "Hello" or "Goodbye", in some random order, but will always be at least 'Hello' or 'Goodbye'.
The actual write of the string 'Hello' or 'Goodbye' is guaranteed by the CLR to be atomic. That is to say, the writing of the value 'Hello' cannot be interrupted by a thread trying to write 'Goodbye'. One must occur before or after the other, never in between.
So you need to create some kind of synchronization mechanism to prevent the arrays from stepping on each other. You can accomplish this by using a lock statement in C#.
C# 3.0 and above provide a generic collection class called SynchronizedCollection which "provides a thread-safe collection that contains objects of a type specified by the generic parameter as elements."
Array is thread safe if it is named public and static keywords - instant is not guaranteed - as the System.Array implements the ICollection interface which define some synchronize method to support synchronizing mechanism.
However, coding to enumerate through the array's item is not safe, developer should implement lock statement to make sure there is no change to the array during the array enumeration.
EX:
Array arrThreadSafe = new string[] {"We", "are", "safe"};
lock(arrThreadSafe.SyncRoot)
{
foreach (string item in arrThreadSafe)
{
Console.WriteLine(item);
}
}
Time and time again I find myself having to write thread-safe versions of BindingList and ObservableCollection because, when bound to UI, these controls cannot be changed from multiple threads. What I'm trying to understand is why this is the case - is it a design fault or is this behavior intentional?
The problem is designing a thread safe collection is not simple. Sure it's simple enough to design a collection which can be modified/read from multiple threads without corrupting state. But it's much more difficult to design a collection that is usable given that it's updated from multiple threads. Take the following code as an example.
if ( myCollection.Count > 0 ) {
var x = myCollection[0];
}
Assume that myCollection is a thread safe collection where adds and updates are guaranteed not to corrupt state. This code is not thread safe and is a race condition.
Why? Even though myCollection is safe, there is no guarantee that a change does not occur between the two method calls to myCollection: namedly Count and the indexer. Another thread can come in and remove all elements between these calls.
This type of problem makes using a collection of this type quite frankly a nightmare. You can't ever let the return value of one call influence a subsequent call on the collection.
EDIT
I expanded this discussion on a recent blog post: http://blogs.msdn.com/jaredpar/archive/2009/02/11/why-are-thread-safe-collections-so-hard.aspx
To add a little to Jared's excellent answer: thread safety does not come for free. Many (most?) collections are only used within a single thread. Why should those collections have performance or functionality penalties to cope with the multi-threaded case?
Gathering ideas from all the other answers, I think this is the simplest way to resolve your issues:
Change your question from:
"Why isn't class X sane?"
to
"What is the sane way of doing this with class X?"
in your class's constructor, get the current displatcher as you create
your observable collections. Becuase, as you pointed out, modification need to
be done on the original thread, which may not be the main GUI thread.
So App.Current.Dispatcher isn't alwasys right,
and not all classes have a this.Dispatcher.
_dispatcher = System.Windows.Threading.Dispatcher.CurrentDispatcher;
_data = new ObservableCollection<MyDataItemClass>();
Use the dispatcher to Invoke your code sections
that need the original thread.
_dispatcher.Invoke(new Action(() => { _data.Add(dataItem); }));
That should do the trick for you. Though there are situations you might prefer .BeginInvoke instead of .Invoke.
If you want to go crazy - here's a ThreadedBindingList<T> that does notifications back on the UI thread automatically. However, it would still only be safe for one thread to be making updates etc at a time.