Parallel.ForEach on List<Object> Thread Safety

Parallel.ForEach on List<Object> Thread Safety - c#

As far as Thread Safety goes is this ok to do or do I need to be using a different collection ?
List<FileMemberEntity> fileInfo = getList();
Parallel.ForEach(fileInfo, fileMember =>
{
//Modify each fileMember
}

As long as you are only modifying the contents of the item that is passed to the method, there is no locking needed.
(Provided of course that there are no duplicate reference in the list, i.e. two references to the same FileMemberEntity instance.)
If you need to modify the list itself, create a copy that you can iterate, and use a lock when you modify the list:
List<FileMemberEntity> fileInfo = getList();
List<FileMemberEntity> copy = new List<FileMemberEntity>(fileInfo);
object sync = new Object();
Parallel.ForEach(copy, fileMember => {
// do something
lock (sync) {
// here you can add or remove items from the fileInfo list
}
// do something
});

You're safe since you are just reading. Just don't modify the list while you are iterating over its items.

We should use less lock object to make it faster. Only lock object in different local threads of Parrallel.ForEach:
List<FileMemberEntity> copy = new List<FileMemberEntity>(fileInfo);
object sync = new Object();
Parallel.ForEach<FileMemberEntity, List<FileMemberEntity>>(
copy,
() => { return new List<FileMemberEntity>(); },
(itemInCopy, state, localList) =>
{
// here you can add or remove items from the fileInfo list
localList.Add(itemInCopy);
return localList;
},
(finalResult) => { lock (sync) copy.AddRange(finalResult); }
);
// do something
Reference: http://msdn.microsoft.com/en-gb/library/ff963547.aspx

If it does not matter what order the FileMemberEntity objects are acted on, you can use List<T> because you are not modifying the list.
If you must ensure some sort of ordering, you can use OrderablePartitioner<T> as a base class and implement an appropriate partitioning scheme. For example, if the FileMemberEntity has some sort of categorization and you must process each of the categories in some specific order, you would want to go this route.
Hypothetically if you have
Object 1 Category A
Object 2 Category A
Object 3 Category B
there is no guarantee that Object 2 Category A will be processed before Object 3 Category B is processed when iterating a List<T> using Parallel.ForEach.
The MSDN documentation you link to provides an example of how to do that.

Related

Updating List in ConcurrentDictionary

So I have a IList as the value in my ConcurrentDictionary.
ConcurrentDictionary<int, IList<string>> list1 = new ConcurrentDictionary<int, IList<string>>;
In order to update a value in a list I do this:
if (list1.ContainsKey[key])
{
IList<string> templist;
list1.TryGetValue(key, out templist);
templist.Add("helloworld");
}
However, does adding a string to templist update the ConcurrentDictionary? If so, is the update thread-safe so that no data corruption would occur?
Or is there a better way to update or create a list inside the ConcurrentDictionary
EDIT
If I were to use a ConcurrentBag instead of a List, how would I implement this? More specifically, how could I update it? ConcurrentDictionary's TryUpdate method feels a bit excessive.
Does ConcurrentBag.Add update the ConcurrentDictionary in a thread-safe mannar?
ConcurrentDictionary<int, ConcurrentBag<string>> list1 = new ConcurrentDictionary<int, ConcurrentBag<string>>

Firstly, there's no need to do ContainsKey() and TryGetValue().
You should just do this:
IList<string> templist;
if (list1.TryGetValue(key, out templist))
templist.Add("helloworld");
In fact your code as written has a race condition.
Inbetween one thread calling ContainsKey() and TryGetValue() a different thread may have removed the item with that key. Then TryGetValue() will return tempList as null, and then you'll get a null reference exception when you call tempList.Add().
Secondly, yes: There's another possible threading issue here. You don't know that the IList<string> stored inside the dictionary is threadsafe.
Therefore calling tempList.Add() is not guaranteed to be safe.
You could use ConcurrentQueue<string> instead of IList<string>. This is probably going to be the most robust solution.
Note that simply locking access to the IList<string> wouldn't be sufficient.
This is no good:
if (list1.TryGetValue(key, out templist))
{
lock (locker)
{
templist.Add("helloworld");
}
}
unless you also use the same lock everywhere else that the IList may be accessed. This is not easy to achieve, hence it's better to either use a ConcurrentQueue<> or add locking to this class and change the architecture so that no other threads have access to the underlying IList.

Operations on a thread-safe dictionary are thread-safe by key, so to say. So as long as you access your values (in this case an IList<T>) only from one thread, you're good to go.
The ConcurrentDictionary does not prevent two threads at the same time to access the value beloning to one key.

You can use ConcurrentDictionary.AddOrUpdate method to add item to list in thread-safe way. Its simpler and should work fine.
var list1 = new ConcurrentDictionary<int, IList<string>>();
list1.AddOrUpdate(key,
new List<string>() { "test" }, (k, l) => { l.Add("test"); return l;});
UPD
According to docs and sources, factories, which was passed to AddOrUpdate method will be run out of lock scope, so calling List methods inside factory delegate is NOT thread safe.
See comments under this answer.

The ConcurrentDictionary has no effect on whether you can apply changes to value objects in a thread-safe manner or not. That is the reponsiblity of the value object (the IList-implementation in your case).
Looking at the answers of No ConcurrentList<T> in .Net 4.0? there are some good reasons why there is no ConcurrentList implementation in .net.
Basically you have to take care of thread-safe changes yourself. The most simple way is to use the lock operator. E.g.
lock (templist)
{
templist.Add("hello world");
}
Another way is to use the ConcurrentBag in the .net Framework. But this way is only useful for you, if you do not rely on the IList interface and the ordering of items.

it has been already mentioned about what would be the best solution ConcurrentDictionary with ConcurrentBag. Just going to add how to do that
ConcurrentBag<string> bag= new ConcurrentBag<string>();
bag.Add("inputstring");
list1.AddOrUpdate(key,bag,(k,v)=>{
v.Add("inputString");
return v;
});

does adding a string to templist update the ConcurrentDictionary?
It does not.
Your thread safe collection (Dictionary) holds references to non-thread-safe collections (IList). So changing those is not thread safe.
I suppose you should consider using mutexes.

If you use ConcurrentBag<T>:
var dic = new ConcurrentDictionary<int, ConcurrentBag<string>>();
Something like this could work OK:
public static class ConcurentDictionaryExt
{
public static ConcurrentBag<V> AddToInternal<K, V>(this ConcurrentDictionary<K, ConcurrentBag<V>> dic, K key, V value)
=> dic.AddOrUpdate(key,
k => new ConcurrentBag<V>() { value },
(k, existingBag) =>
{
existingBag.Add(value);
return existingBag;
}
);
public static ConcurrentBag<V> AddRangeToInternal<K, V>(this ConcurrentDictionary<K, ConcurrentBag<V>> dic, K key, IEnumerable<V> values)
=> dic.AddOrUpdate(key,
k => new ConcurrentBag<V>(values),
(k, existingBag) =>
{
foreach (var v in values)
existingBag.Add(v);
return existingBag;
}
);
}
I didn't test it yet :)

How to write copy-on-write list in .NET

How to write a thread-safe list using copy-on-write model in .NET?
Below is my current implementation, but after lots of reading about threading, memory barriers, etc, I know that I need to be cautious when multi-threading without locks is involved. Could someone comment if this is the correct implementation?
class CopyOnWriteList
{
private List<string> list = new List<string>();
private object listLock = new object();
public void Add(string item)
{
lock (listLock)
{
list = new List<string>(list) { item };
}
}
public void Remove(string item)
{
lock (listLock)
{
var tmpList = new List<string>(list);
tmpList.Remove(item);
list = tmpList;
}
}
public bool Contains(string item)
{
return list.Contains(item);
}
public string Get(int index)
{
return list[index];
}
}
EDIT
To be more specific: is above code thread safe, or should I add something more? Also, will all thread eventually see change in list reference? Or maybe I should add volatile keyword on list field or Thread.MemoryBarrier in Contains method between accessing reference and calling method on it?
Here is for example Java implementation, looks like my above code, but is such approach also thread-safe in .NET?
And here is the same question, but also in Java.
Here is another question related to this one.

Implementation is correct because reference assignment is atomic in accordance to Atomicity of variable references. I would add volatile to list.

Your approach looks correct, but I'd recommend using a string[] rather than a List<string> to hold your data. When you're adding an item, you know exactly how many items are going to be in the resulting collection, so you can create a new array of exactly the size required. When removing an item, you can grab a copy of the list reference and search it for your item before making a copy; if it turns out that the item doesn't exist, there's no need to remove it. If it does exist, you can create a new array of the exact required size, and copy to the new array all the items preceding or following the item to be removed.
Another thing you might want to consider would be to use a int[1] as your lock flag, and use a pattern something like:
static string[] withAddedItem(string[] oldList, string dat)
{
string[] result = new string[oldList.Length+1];
Array.Copy(oldList, result, oldList.Length);
return result;
}
int Add(string dat) // Returns index of newly-added item
{
string[] oldList, newList;
if (listLock[0] == 0)
{
oldList = list;
newList = withAddedItem(oldList, dat);
if (System.Threading.Interlocked.CompareExchange(list, newList, oldList) == oldList)
return newList.Length;
}
System.Threading.Interlocked.Increment(listLock[0]);
lock (listLock)
{
do
{
oldList = list;
newList = withAddedItem(oldList, dat);
} while (System.Threading.Interlocked.CompareExchange(list, newList, oldList) != oldList);
}
System.Threading.Interlocked.Decrement(listLock[0]);
return newList.Length;
}
If there is no write contention, the CompareExchange will succeed without having to acquire a lock. If there is write contention, writes will be serialized by the lock. Note that the lock here is neither necessary nor sufficient to ensure correctness. Its purpose is to avoid thrashing in the event of write contention. It is possible that thread #1 might get past its first "if" test, and get task task-switched out while many other threads simultaneously try to write the list and start using the lock. If that occurs, thread #1 might then "surprise" the thread in the lock by performing its own CompareExchange. Such an action would result in the lock-holding thread having to waste time making a new array, but that situation should arise rarely enough that the occasional cost of an extra array copy shouldn't matter.

Yes, it is thread-safe:
Collection modifications in Add and Remove are done on separate collections, so it avoids concurrent access to the same collection from Add and Remove or from Add/Remove and Contains/Get.
Assignment of the new collection is done inside lock, which is just pair of Monitor.Enter and Monitor.Exit, which both do a full memory barrier as noted here, which means that after the lock all threads should observe the new value of list field.

c# how can i change the values of the original list when i process the list which extracted from the original list using List.Where

I have a list contains many Blocks, and the Blocks are divided into old ones and new ones, using a property 'IsOld' in class 'Block' to specify them. Now I extract the old ones like this:
List<Block> olds = blocks.Where(item => {return item.IsOld;}).ToList();
and after I change some values of the properties on the items in 'olds' list, the corresponding ones in the original list 'blocks' do not change. And I don't know how to deal with it. And that means if I use List.Where method, it will make a deep copy for every item?
'Block' is a struct and I first thought it was a class.
so sorry to trouble you because of my careless o(︶︿︶)o

You are saying that you are changing values of properties, and not replacing the objects themselves. Therefore the original objects in the blocks list should also be updated, that is if Block is defined as a class-type. If Block is a struct, than every assignment is done in a copy of the original object, and then your changes in the olds list should not be reflected in the original blocks list.
Is Block a struct?

If you want to change the properties of the "old" items, why not just do:
blocks.ForEach(item =>
{
if (item.IsOld)
{
// Do your changes here ...
}
});

instead of what you are doing, try this :
blocks.ForEach(item => { if(item.IsOld){/*perform change here*/} });
Also, if you want to see where you are going wrong, paste your code where you are changing, it would be more clear

Since block is a struct, you could do this:
List<Block> updatedBlocks = blocks
.Select(item => item.IsOld ? new Block { X=item.X, Y=item.Y, Z=updatedZ } : item)
.ToList();
More readably, define a method
private Block UpdateBlock(Block sourceValue)
{
if (!sourceValue.IsOld)
return sourceValue;
var result = new Block // ...
// populate result here...
return result;
}
Then call it like this:
List<Block> updatedBlocks = blocks.Select(UpdateBlock).ToList();

List threading issue

I'm trying to make my application thread safe. I hold my hands up and admit I'm new to threading so not sure what way to proceed.
To give a simplified version, my application contains a list.
Most of the application accesses this list and doesn't change it but
may enumerate through it. All this happens on the UI thread.
Thread
one will periodically look for items to be Added and Removed from the
list.
Thread two will enumerate the list and update the items with
extra information. This has to run at the same time as thread one as
can take anything from seconds to hours.
The first question is does anyone have a recommend stragy for this.
Secondly I was trying to make seperate copies of the list that the main application will use, periodically getting a new copy when something is updated/added or removed, but this doesn't seem to be working.
I have my list and a copy......
public class MDGlobalObjects
{
public List<T> mainList= new List<T>();
public List<T> copyList
{
get
{
return new List<T>(mainList);
}
}
}
If I get copyList, modify it, save mainlist, restart my application, load mainlist and look again at copylist then the changes are present. I presume I've done something wrong as copylist seems to still refer to mainlist.
I'm not sure if it makes a difference but everything is accessed through a static instance of the class.
public static MDGlobalObjects CacheObjects = new MDGlobalObjects();

This is the gist using a ConcurrentDictionary:
public class Element
{
public string Key { get; set; }
public string Property { get; set; }
public Element CreateCopy()
{
return new Element
{
Key = this.Key,
Property = this.Property,
};
}
}
var d = new ConcurrentDictionary<string, Element>();
// thread 1
// prune
foreach ( var kv in d )
{
if ( kv.Value.Property == "ToBeRemoved" )
{
Element dummy = null;
d.TryRemove( kv.Key, out dummy );
}
}
// thread 1
// add
Element toBeAdded = new Element();
// set basic properties here
d.TryAdd( toBeAdded.Key, toBeAdded );
// thread 2
// populate element
Element unPopulated = null;
if ( d.TryGetValue( "ToBePopulated", out unPopulated ) )
{
Element nowPopulated = unPopulated.CreateCopy();
nowPopulated.Property = "Populated";
// either
d.TryUpdate( unPopulated.Key, nowPopulated, unPopulated );
// or
d.AddOrUpdate( unPopulated.Key, nowPopulated, ( key, value ) => nowPopulated );
}
// read threads
// enumerate
foreach ( Element element in d.Values )
{
// do something with each element
}
// read threads
// try to get specific element
Element specific = null;
if ( d.TryGetValue( "SpecificKey", out specific ) )
{
// do something with specific element
}
In thread 2, if you can set properties so that the whole object is consistent after each atomic write, then you can skip making a copy and just populate the properties with the object in place in the collection.
There are a few race conditions in this code, but they should be benign in that readers always have a consistent view of the collection.

actly copylist is just a shallow copy of the mainList. the list is new but the refrences of the objects contained in the list are still the same. to achieve what you are trying to you have to make a deep copy of the list
something like this
public static IEnumerable<T> Clone<T>(this IEnumerable<T> collection) where T : ICloneable
{
return collection.Select(item => (T)item.Clone());
}
and use it like
return mainList.Clone();

looking at your ques again.. i would like to suggest an overall change of approach.
you should use ConcurrentDictionary() as you are using .Net 4.0. in that you wont hav eto use locks as a concurrent collection always maintains a valid state.
so your code will look something like this.
Thread 1s code --- <br>
var object = download_the_object();
dic.TryAdd("SomeUniqueKeyOfTheObject",object);
//try add will return false so implement some sort of retry mechanism
Thread 2s code
foreach(var item in Dictionary)
{
var object item.Value;
var extraInfo = downloadExtraInfoforObject(object);
//update object by using Update
dictionary.TryUpdate(object.uniqueKey,"somenewobjectWithExtraInfoAdded",object);
}

ToList()-- does it create a new list?

Let's say I have a class
public class MyObject
{
public int SimpleInt{get;set;}
}
And I have a List<MyObject>, and I ToList() it and then change one of the SimpleInt, will my change be propagated back to the original list. In other words, what would be the output of the following method?
public void RunChangeList()
{
var objs = new List<MyObject>(){new MyObject(){SimpleInt=0}};
var whatInt = ChangeToList(objs );
}
public int ChangeToList(List<MyObject> objects)
{
var objectList = objects.ToList();
objectList[0].SimpleInt=5;
return objects[0].SimpleInt;
}
Why?
P/S: I'm sorry if it seems obvious to find out. But I don't have compiler with me now...

Yes, ToList will create a new list, but because in this case MyObject is a reference type then the new list will contain references to the same objects as the original list.
Updating the SimpleInt property of an object referenced in the new list will also affect the equivalent object in the original list.
(If MyObject was declared as a struct rather than a class then the new list would contain copies of the elements in the original list, and updating a property of an element in the new list would not affect the equivalent element in the original list.)

From the Reflector'd source:
public static List<TSource> ToList<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
return new List<TSource>(source);
}
So yes, your original list won't be updated (i.e. additions or removals) however the referenced objects will.

ToList will always create a new list, which will not reflect any subsequent changes to the collection.
However, it will reflect changes to the objects themselves (Unless they're mutable structs).
In other words, if you replace an object in the original list with a different object, the ToList will still contain the first object.
However, if you modify one of the objects in the original list, the ToList will still contain the same (modified) object.

Yes, it creates a new list. This is by design.
The list will contain the same results as the original enumerable sequence, but materialized into a persistent (in-memory) collection. This allows you to consume the results multiple times without incurring the cost of recomputing the sequence.
The beauty of LINQ sequences is that they are composable. Often, the IEnumerable<T> you get is the result of combining multiple filtering, ordering, and/or projection operations. Extension methods like ToList() and ToArray() allow you to convert the computed sequence into a standard collection.

The accepted answer correctly addresses the OP's question based on his example. However, it only applies when ToList is applied to a concrete collection; it does not hold when the elements of the source sequence have yet to be instantiated (due to deferred execution). In case of the latter, you might get a new set of items each time you call ToList (or enumerate the sequence).
Here is an adaptation of the OP's code to demonstrate this behaviour:
public static void RunChangeList()
{
var objs = Enumerable.Range(0, 10).Select(_ => new MyObject() { SimpleInt = 0 });
var whatInt = ChangeToList(objs); // whatInt gets 0
}
public static int ChangeToList(IEnumerable<MyObject> objects)
{
var objectList = objects.ToList();
objectList.First().SimpleInt = 5;
return objects.First().SimpleInt;
}
Whilst the above code may appear contrived, this behaviour can appear as a subtle bug in other scenarios. See my other example for a situation where it causes tasks to get spawned repeatedly.

A new list is created but the items in it are references to the orginal items (just like in the original list). Changes to the list itself are independent, but to the items will find the change in both lists.

Just stumble upon this old post and thought of adding my two cents. Generally, if I am in doubt, I quickly use the GetHashCode() method on any object to check the identities. So for above -
public class MyObject
{
public int SimpleInt { get; set; }
}
class Program
{
public static void RunChangeList()
{
var objs = new List<MyObject>() { new MyObject() { SimpleInt = 0 } };
Console.WriteLine("objs: {0}", objs.GetHashCode());
Console.WriteLine("objs[0]: {0}", objs[0].GetHashCode());
var whatInt = ChangeToList(objs);
Console.WriteLine("whatInt: {0}", whatInt.GetHashCode());
}
public static int ChangeToList(List<MyObject> objects)
{
Console.WriteLine("objects: {0}", objects.GetHashCode());
Console.WriteLine("objects[0]: {0}", objects[0].GetHashCode());
var objectList = objects.ToList();
Console.WriteLine("objectList: {0}", objectList.GetHashCode());
Console.WriteLine("objectList[0]: {0}", objectList[0].GetHashCode());
objectList[0].SimpleInt = 5;
return objects[0].SimpleInt;
}
private static void Main(string[] args)
{
RunChangeList();
Console.ReadLine();
}
And answer on my machine -
objs: 45653674
objs[0]: 41149443
objects: 45653674
objects[0]: 41149443
objectList: 39785641
objectList[0]: 41149443
whatInt: 5
So essentially the object that list carries remain the same in above code. Hope the approach helps.

I think that this is equivalent to asking if ToList does a deep or shallow copy. As ToList has no way to clone MyObject, it must do a shallow copy, so the created list contains the same references as the original one, so the code returns 5.

ToList will create a brand new list.
If the items in the list are value types, they will be directly updated, if they are reference types, any changes will be reflected back in the referenced objects.

In the case where the source object is a true IEnumerable (i.e. not just a collection packaged an as enumerable), ToList() may NOT return the same object references as in the original IEnumerable. It will return a new List of objects, but those objects may not be the same or even Equal to the objects yielded by the IEnumerable when it is enumerated again

var objectList = objects.ToList();
objectList[0].SimpleInt=5;
This will update the original object as well. The new list will contain references to the objects contained within it, just like the original list. You can change the elements either and the update will be reflected in the other.
Now if you update a list (adding or deleting an item) that will not be reflected in the other list.

I don't see anywhere in the documentation that ToList() is always guaranteed to return a new list. If an IEnumerable is a List, it may be more efficient to check for this and simply return the same List.
The worry is that sometimes you may want to be absolutely sure that the returned List is != to the original List. Because Microsoft doesn't document that ToList will return a new List, we can't be sure (unless someone found that documentation). It could also change in the future, even if it works now.
new List(IEnumerable enumerablestuff) is guaranteed to return a new List. I would use this instead.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parallel.ForEach on List<Object> Thread Safety - c#

As far as Thread Safety goes is this ok to do or do I need to be using a different collection ? List<FileMemberEntity> fileInfo = getList(); Parallel.ForEach(fileInfo, fileMember => { //Modify each fileMember }

You're safe since you are just reading. Just don't modify the list while you are iterating over its items.

Related

Updating List in ConcurrentDictionary

How to write copy-on-write list in .NET

c# how can i change the values of the original list when i process the list which extracted from the original list using List.Where

List threading issue

ToList()-- does it create a new list?

Categories

Resources