So I have a IList as the value in my ConcurrentDictionary.
ConcurrentDictionary<int, IList<string>> list1 = new ConcurrentDictionary<int, IList<string>>;
In order to update a value in a list I do this:
if (list1.ContainsKey[key])
{
IList<string> templist;
list1.TryGetValue(key, out templist);
templist.Add("helloworld");
}
However, does adding a string to templist update the ConcurrentDictionary? If so, is the update thread-safe so that no data corruption would occur?
Or is there a better way to update or create a list inside the ConcurrentDictionary
EDIT
If I were to use a ConcurrentBag instead of a List, how would I implement this? More specifically, how could I update it? ConcurrentDictionary's TryUpdate method feels a bit excessive.
Does ConcurrentBag.Add update the ConcurrentDictionary in a thread-safe mannar?
ConcurrentDictionary<int, ConcurrentBag<string>> list1 = new ConcurrentDictionary<int, ConcurrentBag<string>>
Firstly, there's no need to do ContainsKey() and TryGetValue().
You should just do this:
IList<string> templist;
if (list1.TryGetValue(key, out templist))
templist.Add("helloworld");
In fact your code as written has a race condition.
Inbetween one thread calling ContainsKey() and TryGetValue() a different thread may have removed the item with that key. Then TryGetValue() will return tempList as null, and then you'll get a null reference exception when you call tempList.Add().
Secondly, yes: There's another possible threading issue here. You don't know that the IList<string> stored inside the dictionary is threadsafe.
Therefore calling tempList.Add() is not guaranteed to be safe.
You could use ConcurrentQueue<string> instead of IList<string>. This is probably going to be the most robust solution.
Note that simply locking access to the IList<string> wouldn't be sufficient.
This is no good:
if (list1.TryGetValue(key, out templist))
{
lock (locker)
{
templist.Add("helloworld");
}
}
unless you also use the same lock everywhere else that the IList may be accessed. This is not easy to achieve, hence it's better to either use a ConcurrentQueue<> or add locking to this class and change the architecture so that no other threads have access to the underlying IList.
Operations on a thread-safe dictionary are thread-safe by key, so to say. So as long as you access your values (in this case an IList<T>) only from one thread, you're good to go.
The ConcurrentDictionary does not prevent two threads at the same time to access the value beloning to one key.
You can use ConcurrentDictionary.AddOrUpdate method to add item to list in thread-safe way. Its simpler and should work fine.
var list1 = new ConcurrentDictionary<int, IList<string>>();
list1.AddOrUpdate(key,
new List<string>() { "test" }, (k, l) => { l.Add("test"); return l;});
UPD
According to docs and sources, factories, which was passed to AddOrUpdate method will be run out of lock scope, so calling List methods inside factory delegate is NOT thread safe.
See comments under this answer.
The ConcurrentDictionary has no effect on whether you can apply changes to value objects in a thread-safe manner or not. That is the reponsiblity of the value object (the IList-implementation in your case).
Looking at the answers of No ConcurrentList<T> in .Net 4.0? there are some good reasons why there is no ConcurrentList implementation in .net.
Basically you have to take care of thread-safe changes yourself. The most simple way is to use the lock operator. E.g.
lock (templist)
{
templist.Add("hello world");
}
Another way is to use the ConcurrentBag in the .net Framework. But this way is only useful for you, if you do not rely on the IList interface and the ordering of items.
it has been already mentioned about what would be the best solution ConcurrentDictionary with ConcurrentBag. Just going to add how to do that
ConcurrentBag<string> bag= new ConcurrentBag<string>();
bag.Add("inputstring");
list1.AddOrUpdate(key,bag,(k,v)=>{
v.Add("inputString");
return v;
});
does adding a string to templist update the ConcurrentDictionary?
It does not.
Your thread safe collection (Dictionary) holds references to non-thread-safe collections (IList). So changing those is not thread safe.
I suppose you should consider using mutexes.
If you use ConcurrentBag<T>:
var dic = new ConcurrentDictionary<int, ConcurrentBag<string>>();
Something like this could work OK:
public static class ConcurentDictionaryExt
{
public static ConcurrentBag<V> AddToInternal<K, V>(this ConcurrentDictionary<K, ConcurrentBag<V>> dic, K key, V value)
=> dic.AddOrUpdate(key,
k => new ConcurrentBag<V>() { value },
(k, existingBag) =>
{
existingBag.Add(value);
return existingBag;
}
);
public static ConcurrentBag<V> AddRangeToInternal<K, V>(this ConcurrentDictionary<K, ConcurrentBag<V>> dic, K key, IEnumerable<V> values)
=> dic.AddOrUpdate(key,
k => new ConcurrentBag<V>(values),
(k, existingBag) =>
{
foreach (var v in values)
existingBag.Add(v);
return existingBag;
}
);
}
I didn't test it yet :)
Related
My working assumption is that LINQ is thread-safe when used with the System.Collections.Concurrent collections (including ConcurrentDictionary).
(Other Overflow posts seem to agree: link)
However, an inspection of the implementation of the LINQ OrderBy extension method shows that it appears not to be threadsafe with the subset of concurrent collections which implement ICollection (e.g. ConcurrentDictionary).
The OrderedEnumerable GetEnumerator (source here) constructs an instance of a Buffer struct (source here) which tries to cast the collection to an ICollection (which ConcurrentDictionary implements) and then performs a collection.CopyTo with an array initialised to the size of the collection.
Therefore, if the ConcurrentDictionary (as the concrete ICollection in this case) grows in size during the OrderBy operation, between initialising the array and copying into it, this operation will throw.
The following test code shows this exception:
(Note: I appreciate that performing an OrderBy on a thread-safe collection which is changing underneath you is not that meaningful, but I do not believe it should throw)
using System;
using System.Collections.Concurrent;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
namespace Program
{
class Program
{
static void Main(string[] args)
{
try
{
int loop = 0;
while (true) //Run many loops until exception thrown
{
Console.WriteLine($"Loop: {++loop}");
_DoConcurrentDictionaryWork().Wait();
}
}
catch (Exception ex)
{
Console.WriteLine(ex);
}
}
private static async Task _DoConcurrentDictionaryWork()
{
var concurrentDictionary = new ConcurrentDictionary<int, object>();
var keyGenerator = new Random();
var tokenSource = new CancellationTokenSource();
var orderByTaskLoop = Task.Run(() =>
{
var token = tokenSource.Token;
while (token.IsCancellationRequested == false)
{
//Keep ordering concurrent dictionary on a loop
var orderedPairs = concurrentDictionary.OrderBy(x => x.Key).ToArray(); //THROWS EXCEPTION HERE
//...do some more work with ordered snapshot...
}
});
var updateDictTaskLoop = Task.Run(() =>
{
var token = tokenSource.Token;
while (token.IsCancellationRequested == false)
{
//keep mutating dictionary on a loop
var key = keyGenerator.Next(0, 1000);
concurrentDictionary[key] = new object();
}
});
//Wait for 1 second
await Task.Delay(TimeSpan.FromSeconds(1));
//Cancel and dispose token
tokenSource.Cancel();
tokenSource.Dispose();
//Wait for orderBy and update loops to finish (now token cancelled)
await Task.WhenAll(orderByTaskLoop, updateDictTaskLoop);
}
}
}
That the OrderBy throws an exception leads to one of a few possible conclusions:
1) My assumption about LINQ being threadsafe with concurrent collections is incorrect, and it is only safe to perform LINQ on collections (be they concurrent or not) which are not mutating during the LINQ query
2) There is a bug with the implementation of LINQ OrderBy and it is incorrect for the implementation to try and cast the source collection to an ICollection and try and perform the collection copy (and It should just drop through to its default behaviour iterating the IEnumerable).
3) I have misunderstood what is going on here...
Thoughts much appreciated!
It's not stated anywhere that OrderBy (or other LINQ methods) should always use GetEnumerator of source IEnumerable or that it should be thread safe on concurrent collections. All that is promised is this method
Sorts the elements of a sequence in ascending order according to a
key.
ConcurrentDictionary is not thread-safe in some global sense either. It's thread-safe with respect to other operations performed on it. Even more, documentation says that
All public and protected members of ConcurrentDictionary
are thread-safe and may be used concurrently from multiple threads.
However, members accessed through one of the interfaces the
ConcurrentDictionary implements, including extension
methods, are not guaranteed to be thread safe and may need to be
synchronized by the caller.
So, your understanding is correct (OrderBy will see IEnumerable you pass to it is really ICollection, will then get length of that collection, allocate buffer of that size, then will call ICollection.CopyTo, and this is of course not thread safe on any type of collection), but it's not a bug in OrderBy because neither OrderBy nor ConcurrentDictionary ever promised what you assume.
If you want to do OrderBy in a thread safe way on ConcurrentDictionary, you need to rely on methods that are promised to be thread safe. For example:
// note: this is NOT IEnumerable.ToArray()
// but public ToArray() method of ConcurrentDictionary itself
// it is guaranteed to be thread safe with respect to other operations
// on this dictionary
var snapshot = concurrentDictionary.ToArray();
// we are working on snapshot so no one other thread can modify it
// of course at this point real contents of dictionary might not be
// the same as our snapshot
var sorted = snapshot.OrderBy(c => c.Key);
If you don't want to allocate additional array (with ToArray), you can use Select(c => c) and it will work in this case, but then we are again in moot territory and relying on something to be safe to use in situation it was not promised to (Select will also not always enumerate your collection. If collection is array or list - it will shortcut and use indexers instead). So you can create extension method like this:
public static class Extensions {
public static IEnumerable<T> ForceEnumerate<T>(this ICollection<T> collection) {
foreach (var item in collection)
yield return item;
}
}
And use it like this if you want to be safe and don't want to allocate array:
concurrentDictionary.ForceEnumerate().OrderBy(c => c.Key).ToArray();
In this case we are forcing enumeration of ConcurrentDictionary (which we know is safe from documentation) and then pass that to OrderBy knowing that it cannot do any harm with that pure IEnumerable. Note that as correctly pointed out in comments by mjwills, this is not exactly the same as ToArray, because ToArray produces snapshot (locks collection preventing modifications while building array) and Select \ yield does not acquire any locks (so items might be added\removed right when enumeration is in progress). Though I doubt it matters when doing things like described in question - in both cases after OrderBy is completed - you have no idea whether your ordered results reflect current state of collection or not.
In my program, i'm iterating through a list of 'Group' objects using a Parallel.Foreach loop. Inside this loop, I first check my concurrentdictionary if a key exists, and if the value contains a Group property. I then add an object to a list depending on whether or not the dictionary has the key and value. Code shown below:
var roleUsers = new ExtendedBindingList<RoleUser>();
ConcurrentDictionary<int, List<int>> roleMatch = new ConcurrentDictionary<int, List<int>>();
Parallel.ForEach(groupsWithRole, group =>
{
foreach (var u in usersInThisGroup[group.GroupID])
{
if (roleMatch.ContainsKey(u.UserID) && roleMatch[u.UserID].Contains(group.RoleID))
continue;
//
//Unimportant logic
//
lock (writelock)
{
roleUsers.Add(roleUser);
if (!roleMatch.ContainsKey(u.UserID))
roleMatch.TryAdd(u.UserID, new List<int>());
roleMatch[u.UserID].Add(group.RoleID);
}
}
}
});
I've come to find that the list roleUsers doesn't always have the same number of objects in it, when it very much should. It's quite obviously a threading issue. My question is, is there any way, besides locking the whole thing, to read and write to the concurrentdictionary safely?
How to write a thread-safe list using copy-on-write model in .NET?
Below is my current implementation, but after lots of reading about threading, memory barriers, etc, I know that I need to be cautious when multi-threading without locks is involved. Could someone comment if this is the correct implementation?
class CopyOnWriteList
{
private List<string> list = new List<string>();
private object listLock = new object();
public void Add(string item)
{
lock (listLock)
{
list = new List<string>(list) { item };
}
}
public void Remove(string item)
{
lock (listLock)
{
var tmpList = new List<string>(list);
tmpList.Remove(item);
list = tmpList;
}
}
public bool Contains(string item)
{
return list.Contains(item);
}
public string Get(int index)
{
return list[index];
}
}
EDIT
To be more specific: is above code thread safe, or should I add something more? Also, will all thread eventually see change in list reference? Or maybe I should add volatile keyword on list field or Thread.MemoryBarrier in Contains method between accessing reference and calling method on it?
Here is for example Java implementation, looks like my above code, but is such approach also thread-safe in .NET?
And here is the same question, but also in Java.
Here is another question related to this one.
Implementation is correct because reference assignment is atomic in accordance to Atomicity of variable references. I would add volatile to list.
Your approach looks correct, but I'd recommend using a string[] rather than a List<string> to hold your data. When you're adding an item, you know exactly how many items are going to be in the resulting collection, so you can create a new array of exactly the size required. When removing an item, you can grab a copy of the list reference and search it for your item before making a copy; if it turns out that the item doesn't exist, there's no need to remove it. If it does exist, you can create a new array of the exact required size, and copy to the new array all the items preceding or following the item to be removed.
Another thing you might want to consider would be to use a int[1] as your lock flag, and use a pattern something like:
static string[] withAddedItem(string[] oldList, string dat)
{
string[] result = new string[oldList.Length+1];
Array.Copy(oldList, result, oldList.Length);
return result;
}
int Add(string dat) // Returns index of newly-added item
{
string[] oldList, newList;
if (listLock[0] == 0)
{
oldList = list;
newList = withAddedItem(oldList, dat);
if (System.Threading.Interlocked.CompareExchange(list, newList, oldList) == oldList)
return newList.Length;
}
System.Threading.Interlocked.Increment(listLock[0]);
lock (listLock)
{
do
{
oldList = list;
newList = withAddedItem(oldList, dat);
} while (System.Threading.Interlocked.CompareExchange(list, newList, oldList) != oldList);
}
System.Threading.Interlocked.Decrement(listLock[0]);
return newList.Length;
}
If there is no write contention, the CompareExchange will succeed without having to acquire a lock. If there is write contention, writes will be serialized by the lock. Note that the lock here is neither necessary nor sufficient to ensure correctness. Its purpose is to avoid thrashing in the event of write contention. It is possible that thread #1 might get past its first "if" test, and get task task-switched out while many other threads simultaneously try to write the list and start using the lock. If that occurs, thread #1 might then "surprise" the thread in the lock by performing its own CompareExchange. Such an action would result in the lock-holding thread having to waste time making a new array, but that situation should arise rarely enough that the occasional cost of an extra array copy shouldn't matter.
Yes, it is thread-safe:
Collection modifications in Add and Remove are done on separate collections, so it avoids concurrent access to the same collection from Add and Remove or from Add/Remove and Contains/Get.
Assignment of the new collection is done inside lock, which is just pair of Monitor.Enter and Monitor.Exit, which both do a full memory barrier as noted here, which means that after the lock all threads should observe the new value of list field.
How do I implement AddorUpdate in ConcurrentDictionary so that I can correctly update the value, if that value is a collection?
My concern is that since TValue is a reference type I might run into the situation where I call TValue multiple times in a race condition. I would test this myself, but my syntax is wrong so I can't proceed further.
What must I change to make this work?
public class TrustList : ConcurrentDictionary<int, List<TrustRelationshipDetail>>
{
public void AddOrUpdateTrustDetail(TrustRelationshipDetail detail)
{
List<TrustRelationshipDetail> detailList = new List<TrustRelationshipDetail>();
detailList.Add(detail);
this.AddOrUpdate(detail.HierarchyDepth, detailList, (key, oldValue) =>
oldValue.Add(detail) // <--- Compiler doesn't like this, and I think this may cause duplicates if this were to be called...
);
}
}
The purpose of AddOrUpdate() is to replace any existing value with a new value.
Since you only need to get the existing value (in order to then modify it), you want GetOrAdd():
this.GetOrAdd(detail.HierarchyDepth, new ConcurrentBag<TrustRelationshipDetail>())
.Add(detail);
So I have a method that gets a Dictionary of List<myObj>, then cycles through the keys of the dictionary and passes each List<myObj> to a separate thread.
Here is some Code / Psuedo-Code:
public static void ProcessEntries() {
Dictionary<string, List<myObj>> myDictionary = GetDictionary();
foreach(string key in myDictionary.keys)
{
List<myObj> myList = myDictionary[key];
Thread myThread = new System.Threading.Thread(new System.Threading.ThreadStart(delegate() {
ProcessList(myList);
}
}
}
public static void ProcessList(List<myObj> myList) {
// Process entries
// read-only operations on myList
}
The problem is that during execution of ProcessList the myList parameter simply changes.
I have looped through the list before kicking of the thread, and then immediately inside the thread, and I've found the results to be different.
I have since solved the problem (I think!) by making the Dictionary variable global. Using the [ThreadStatic] property is next on the list of possible fixes.
What I really want to know is why does the myList object changes inside ProcessList() presumably when the myList object is re-assigned in ProcessEntries() ? Are these not two different Lists ? If all parameter passing is by value by default, why does the ProcessList() function not have a local copy of myList ? (does it?)
Is there a way to specify that you want to pass a parameter to a thread and not have it be altered by the parent thread or other threads during execution? (This would be similar to the [ThreadSafe] attribute for global variables)
I suspect your pseudo-code isn't actually an accurate reflection of your real code. I suspect your real code looks like this:
foreach(var pair in myDictionary)
{
Thread myThread = new Thread(delegate() {
ProcessList(pair.Value);
});
myThread.Start();
}
If that's the case, the problem is that the pair variable is being captured - so by the time your thread starts, it may be referring to a different key/value pair.
The way to fix it is to make the code precisely more like your pseudo-code:
foreach(var pair in myDictionary)
{
// You'll get a new list variable on each iteration
var list = pair.Value;
Thread myThread = new Thread(delegate() {
ProcessList(list);
});
myThread.Start();
}
See Eric Lippert's blog post on this for more information.
If this isn't what's going wrong, please give a real example rather than pseudo-code. A short but complete example demonstrating the problem would be ideal.
Also be sure that other threads don't affect the thread you're trying to use. be sure to use locks and monitors... Had some issues with that just few weeks ago..
You are passing a reference by value in that case, so if you modify it somewhere it will be different everywhwere.