Enumerating a list in a thread-safe way

Enumerating a list in a thread-safe way - c#

Let's say I have a list in a class which will be used in a multi threading scenario.
public class MyClass
{
List<MyItem> _list= new List<MyItem>();
protected object SyncRoot {
get {
return ((IList)_list).SyncRoot;
}
}
public void Execute1()
{
lock(SyncRoot)
{
foreach(var item in _list) DoSomething(item);
}
}
public void Execute2()
{
Item[] list;
lock(SyncRoot)
{
list=_list.ToArray();
}
for(var i=0;i<list.Length;i++) DoSomething(list[i]);
}
}
The method Execute1 is the 'normal' way to enumerate the list in a thread-safe way. But what about Execute2? Is this approach still thread-safe?

Access to the (copy of the) List is threadsafe in both scenarios. But of course the MyItem elements are not synchronized in any way.
The second form looks a little more expensive but it will allow Add/Remove on the original while the DoSomething()s are running. The array acts like a kind of snapshot, if that matches your requirements it could be useful. Note that you might as well use ToList().

It's safe as long as every other use of _list is also protected with the same lock statement. You are taking exclusive access to the list, copying its contents and then working on the copy (to which you also have exclusive access due to scoping). A bit wasteful at first sight, but a legitimate approach under certain circumstances.

Related

How to use a snapshot while using where clause

I have the following code which is called every 3 seconds continuously from a thread
public class SomeClass
{
List<Person> _list;
public SetList(List<Preson> list)
{
_list = list;
}
private void WorkToBeDoneEverythreeSeconds()
{
var filteredList= _list.Where(x= x.IsConditionValid());
//................Use the filtered list here.........
}
}
_list is a reference to a C# List owned by another class passed into this class. The list is updated from a different thread in its owner class. Sometimes updates happening while the Where clause is executed and Invalid operation is thrown.
What is the most efficient way to get a snapshot on the actual list when using the enumerator? I can think of creating another collection from the current collection, but doing this every 3 seconds might not be the best idea?

Pros/Cons on Lists with subsidiary objects

I'm again in the position to figure a way out to handle lists with subsidiary objects on our business objects.
Actually, our code often looks like this:
public class Object
{
private List<SubsidiaryObject> subsidiaryObjects = null;
public List<SubsidiaryObject> SubsidiaryObjects
{
get
{
if (this.subsidiaryObjects == null)
{
this.subsidiaryObjects = DBClass.LoadListFromDatabase();
}
return this.subsidiaryObjects;
}
set
{
this.subsidiaryObjects = value;
}
}
}
The Con on this:
The property is referenced in presentation layer and used for DataBinding. Releasing the reference to the actual list and replacing it with a new one will end in an referenced list in the GUI that does not have anything left with the list on the object.
The Pro on this:
Easy way of reloading the list (just set the reference to null and then get it again).
I developed another class that uses the following pattern:
public class Object2
{
private readonly List<SubsidiaryObject> subsidiaryObjects = new List<SubsidiaryObject>();
public List<SubsidiaryObject> SubsidiaryObjects
{
get
{
return this.subsidiaryObjects;
}
}
public void ReloadSubsidiaryObjects()
{
this.SubsidiaryObjects.Clear();
this.SubsidiaryObjects.AddRange(DBClass.LoadListFromDatabase());
}
}
Pro on this:
Reference is continous.
The Con on this:
Reloading the list is more difficult, since it just cannot be replaced, but must be cleared/filled with reloaded items.
What is your preferred way, for what situations?
What do you see as Pro/Con for either of these to patterns?
Since this is only a general question, not for a specific problem, every answer is welcome.

Do you need the caller to be able to modify the list? If not you should consider returning IEnumerable<T> or ReadOnlyCollection instead. And even if you do, you will probably be better off making cover versions for Add/Remove so you can intercept modifications. Handing a reference to internal state is not a good idea IMO.
A third option would be to go with option 2, but to create a new instance of the Object2 type each time you need to repopulate the list. Without additional context for the question, that is the option I would select, but there may be reasons why you would want to hold on to the original instance.

Problem with clearing a List<T>

I don't know why I have an IndexOutOfRangeException when I am clearing a System.Collections.Generic.List<T>. Does this make sense?
List<MyObject> listOfMyObject = new List<MyObject>();
listOfMyObject.Clear();

This typically happens if multiple threads are accessing the list simultaneously. If one thread deletes an element while another calls Clear(), this exception can occur.
The "answer" in this case is to synchronize this appropriately, locking around all of your List access.
Edit:
In order to handle this, the simplest method is to encapsulate your list within a custom class, and expose the methods you need, but lock as needed. You'll need to add locking to anything that alters the collection.
This would be a simple option:
public class MyClassCollection
{
// Private object for locking
private readonly object syncObject = new object();
private readonly List<MyObject> list = new List<MyObject>();
public this[int index]
{
get { return list[index]; }
set
{
lock(syncObject) {
list[index] = value;
}
}
}
public void Add(MyObject value)
{
lock(syncObject) {
list.Add(value);
}
}
public void Clear()
{
lock(syncObject) {
list.Clear();
}
}
// Do any other methods you need, such as remove, etc.
// Also, you can make this class implement IList<MyObject>
// or IEnumerable<MyObject>, but make sure to lock each
// of the methods appropriately, in particular, any method
// that can change the collection needs locking
}

Are you sure that that code throws an exception? I have
using System.Collections.Generic;
class MyObject { }
class Program {
static void Main(string[] args) {
List<MyObject> listOfMyObject = new List<MyObject>();
listOfMyObject.Clear();
}
}
and I do not get an exception.
Is your real-life example more complex? Perhaps you have multiple threads simultaneously accessing the list? Can we see a stack trace?
List<T>.Clear is really quite simple. Using Reflector:
public void Clear() {
if (this._size > 0) {
Array.Clear(this._items, 0, this._size);
this._size = 0;
}
this._version++;
}
In the case when the list already empty, that is not going to ever throw an exception. However, if you are modifying the list on another thread, Array.Clear could throw an IndexOutOfRangeException exception. So if another thread removes an item from the list then this._size (the number of items to clear) will be too big.

The documentation doesn't mention any Exception this method throws, your problem is probably elsewhere.
List<T>.Clear

C#: Encapsulation of for example collections

I am wondering which one of these would be considered the cleanest or best to use and why.
One of them exposes the a list of passengers, which let the user add and remove etc. The other hides the list and only let the user enumerate them and add using a special method.
Example 1
class Bus
{
public IEnumerable<Person> Passengers { get { return passengers; } }
private List<Passengers> passengers;
public Bus()
{
passengers = new List<Passenger>();
}
public void AddPassenger(Passenger passenger)
{
passengers.Add(passenger);
}
}
var bus = new Bus1();
bus.AddPassenger(new Passenger());
foreach(var passenger in bus.Passengers)
Console.WriteLine(passenger);
Example 2
class Bus
{
public List<Person> Passengers { get; private set; }
public Bus()
{
Passengers = new List<Passenger>();
}
}
var bus = new Bus();
bus.Passengers.Add(new Passenger());
foreach(var passenger in bus.Passengers)
Console.WriteLine(passenger);
The first class I would say is better encapsulated. And in this exact case, that might be the better approach (since you should probably make sure it's space left on the bus, etc.). But I guess there might be cases where the second class may be useful as well? Like if the class doesn't really care what happens to that list as long as it has one. What do you think?

In example one, it is possible to mutate your collection.
Consider the following:
var passengers = (List<Passenger>)bus.Passengers;
// Now I have control of the list!
passengers.Add(...);
passengers.Remove(...);
To fix this, you might consider something like this:
class Bus
{
private List<Passenger> passengers;
// Never expose the original collection
public IEnumerable<Passenger> Passengers
{
get { return passengers.Select(p => p); }
}
// Or expose the original collection as read only
public ReadOnlyCollection<Passenger> ReadOnlyPassengers
{
get { return passengers.AsReadOnly(); }
}
public void AddPassenger(Passenger passenger)
{
passengers.Add(passenger);
}
}

In most cases I would consider example 2 to be acceptable provided that the underlying type was extensible and/or exposed some form of onAdded/onRemoved events so that your internal class can respond to any changes to the collection.
In this case List<T> isn't suitable as there is no way for the class to know if something has been added. Instead you should use a Collection because the Collection<T> class has several virtual members (Insert,Remove,Set,Clear) that can be overridden and event triggers added to notify the wrapping class.
(You do also have to be aware that users of the class can modify the items in the list/collection without the parent class knowing about it, so make sure that you don't rely on the items being unchanged - unless they are immutable obviously - or you can provide onChanged style events if you need to.)

Run your respective examples through FxCop and that should give you a hint about the risks of exposing List<T>

I would say it all comes down to your situation. I would normally go for option 2 as it is the simplest, unless you have a business reason to add tighter controls to it.

Option 2 is the simplest, but that lets other classes to add/remove elements to the collection, which can be dangerous.
I think a good heuristic is to consider what the wrapper methods do. If your AddPassenger (or Remove, or others) method is simply relaying the call to the collection, then I would go for the simpler version. If you have to check the elements before inserting them, then option 1 is basically unavoidable. If you have to keep track of the elements inserted/deleted, you can go either way. With option 2 you have to register events on the collection to get notifications, and with option 1 you have to create wrappers for every operation on the list that you want to use (e.g. if you want Insert as well as Add), so I guess it depends.

Initializing a collection so the user doesn't have to

This might be a stupid question, but is there any common practice for initializing collection properties for a user, so they don't have to new up a new concrete collection before using it in a class?
Are any of these preferred over the other?
Option 1:
public class StringHolderNotInitialized
{
// Force user to assign an object to MyStrings before using
public IList<string> MyStrings { get; set; }
}
Option 2:
public class StringHolderInitializedRightAway
{
// Initialize a default concrete object at construction
private IList<string> myStrings = new List<string>();
public IList<string> MyStrings
{
get { return myStrings; }
set { myStrings = value; }
}
}
Option 3:
public class StringHolderLazyInitialized
{
private IList<string> myStrings = null;
public IList<string> MyStrings
{
// If user hasn't set a collection, create one now
// (forces a null check each time, but doesn't create object if it's never used)
get
{
if (myStrings == null)
{
myStrings = new List<string>();
}
return myStrings;
}
set
{
myStrings = value;
}
}
}
Option 4:
Any other good options for this?

In this case, I don't see the reason for the lazy loading, so I would go with option 2. If you are creating a ton of these objects, then the number of allocations and GCs that result would be an issue, but that's not something to consider really unless it proves to be a problem later.
Additionally, for things like this, I would typically not allow the assignment of the IList to the class. I would make this read-only. In not controlling the implementation of the IList, you open yourself up to unexpected implementations.

For Option 1: if you want to force user to initialize something before using your class then best place to force it is in constructor.
For Option 2: If you're not forcing the user then you really have to initialize an empty collection yourself in the constructor/initializer.
For Option 3: Lazy initialization only makes sense if it involves too much work or its a slow/bulky operation.
My vote goes for option 2.

The only real reason for using the lazy loading solution is for an optimization. And the first rule of optimization is "don't optimize unless you've measured" :)
Based on that I would go with the solution least likely to cause an error. In this case that would be solution #2. Setting it in an initializer virtually eliminates the chance of a null ref here. The only way it will occur is if the user explicitly sets it to null.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Enumerating a list in a thread-safe way - c#

Related

How to use a snapshot while using where clause

Pros/Cons on Lists with subsidiary objects

Problem with clearing a List<T>

C#: Encapsulation of for example collections

Initializing a collection so the user doesn't have to

Categories

Resources