I wrote some classes to test multithreading using SynchronizedCollection.
class MultithreadTesting
{
public readonly SynchronizedCollection<int> testlist = new SynchronizedCollection<int>();
public SynchronizedReadOnlyCollection<int> pubReadOnlyProperty
{
get
{
return new SynchronizedReadOnlyCollection<int>(testlist.SyncRoot, testlist);
}
}
public void Test()
{
int numthreads = 20;
Thread[] threads = new Thread[numthreads];
List<Task> taskList = new List<Task>();
for (int i = 0; i < numthreads / 2; i++)
{
taskList.Add(Task.Factory.StartNew(() =>
{
for (int j = 0; j < 100000; j++)
{
testlist.Add(42);
}
}));
}
for (int i = numthreads / 2; i < numthreads; i++)
{
taskList.Add(Task.Factory.StartNew(() =>
{
var sum = 0;
foreach (int num in pubReadOnlyProperty)
{
sum += num;
}
}));
}
Task.WaitAll(taskList.ToArray());
testlist.Clear();
}
}
to run it I use
MultithreadTesting test = new MultithreadTesting();
while (true)
test.Test();
But the code throws me System.ArgumentException: 'Destination array was not long enough. Check destIndex and length, and the array's lower bounds.'
If I try to use testlist in foreach, I get
System.InvalidOperationException: 'Collection was modified; enumeration operation may not execute.'
However, MSDN tells us
SynchronizedReadOnlyCollection Class
Provides a thread-safe, read-only collection that contains objects of
a type specified by the generic parameter as elements.
The root cause of the error is that List<T> construction is not thread-safe.
Let's see what happens when constructing new SynchronizedReadOnlyCollection. Exception occurs in following line:
return new SynchronizedReadOnlyCollection<int>(testlist.SyncRoot, testlist);
As exception StackTrace tells us, there is List<T>..ctor involved in construction process:
at System.Collections.Generic.SynchronizedCollection`1.CopyTo(T[] array, Int32 index)
at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
at System.Collections.Generic.SynchronizedReadOnlyCollection`1..ctor(Object syncRoot, IEnumerable`1 list)
Following snippet from List<T> constructor shows where error happens. Code is copied from MS reference source, I cleaned unnecessary parts of code for easier reading. Please notice that between comments (1) and (2) there are other threads manipulating collection:
public List(IEnumerable<T> collection) {
ICollection<T> c = collection as ICollection<T>;
// (1) count is now current Count of collection
int count = c.Count;
// other threads can modify collection meanwhile
if (count == 0)
{
_items = _emptyArray;
}
else {
_items = new T[count];
// (2) SynchronizedCollection.CopyTo is called (which itself is thread-safe)
// Collection can still be modified between (1) and (2)
// No when _items.Count != c.Count -> Exception is raised.
c.CopyTo(_items, 0);
_size = count;
}
}
Solution
The problem can easily be fixed with locking testlist modification while constructing new SynchronizedReadOnlyCollection.
public SynchronizedReadOnlyCollection<int> pubReadOnlyProperty
{
get
{
lock (testlist.SyncRoot)
{
return new SynchronizedReadOnlyCollection<int>(testlist.SyncRoot, testlist);
}
}
}
Related
I am trying to dynamically add and remove subscribers to an Observable BlockingCollection using the ThreadPoolScheduler.
I am not sure if this is a problem with my code or in RX itself, but I had assumed in this case I should be able to subscribe/unsubscribe as needed.
I have reduced the issue down to the test pasted below.
The code works correctly until I call Dispose on subscribers and then add new subscribers.
Essentially I seem to get old threads still de-queing the Observables but never doing anything with them.
Here is the unit test, it sets up 32 subscribers, adds 64 objects, then unsubscribes and repeats the same test. I have removed any additional code (and added some sleeps just for the test to make sure the threads are def done before I unsubscribe)
The first 64 are processed correctly, but the second set only 32 objects are passed to my subscriber.
[TestClass]
public class RxTests
{
public class ObservTest
{
public BlockingCollection<ObserverTests.UnitTestObservable> mBlockingCollection = new BlockingCollection<ObserverTests.UnitTestObservable>();
public IObservable<ObserverTests.UnitTestObservable> mObservableBlockingCollection;
private static readonly object ObservableLock = new object();
private static volatile ObservTest ObservableInstance;
public static ObservTest Instance
{
get
{
if (ObservableInstance != null)
return ObservableInstance;
lock (ObservableLock)
{
if (ObservableInstance == null)
{
ObservTest observable = new ObservTest();
observable.mObservableBlockingCollection = observable.mBlockingCollection.GetConsumingEnumerable().ToObservable(ThreadPoolScheduler.Instance);
ObservableInstance = observable;
}
return ObservableInstance;
}
}
}
private int count = 0;
public void Release()
{
Interlocked.Increment(ref count);
Console.WriteLine("Release {0} : {1}", count, Thread.CurrentThread.ManagedThreadId);
}
public void LogCount()
{
Console.WriteLine("Total :{0}", count);
}
}
[TestMethod]
public void TestMethod1()
{
IList<IDisposable> subscribers = new List<IDisposable>();
for (int count = 0; count < 32; count++)
{
IDisposable disposable = ObservTest.Instance.mObservableBlockingCollection.Subscribe(Observe);
subscribers.Add(disposable);
}
for (int count = 0; count < 64; count++)
{
ObserverTests.UnitTestObservable observable = new ObserverTests.UnitTestObservable
{
Name = string.Format("{0}", count)
};
ObservTest.Instance.mBlockingCollection.Add(observable);
}
Thread.Sleep(5000);
foreach (IDisposable disposable in subscribers)
{
disposable.Dispose();
}
subscribers.Clear();
for (int count = 0; count < 32; count++)
{
IDisposable disposable = ObservTest.Instance.mObservableBlockingCollection.Subscribe(Observe);
subscribers.Add(disposable);
}
for (int count = 0; count < 64; count++)
{
ObserverTests.UnitTestObservable observable = new ObserverTests.UnitTestObservable
{
Name = string.Format("{0}", count)
};
ObservTest.Instance.mBlockingCollection.Add(observable);
}
Thread.Sleep(3000);
ObservTest.Instance.LogCount();
}
public static void Observe(ObserverTests.UnitTestObservable observable)
{
Console.WriteLine("Observe {0} : {1}", observable.Name, Thread.CurrentThread.ManagedThreadId);
ObservTest.Instance.Release();
}
}
So the final count in the output is 96 when I would expect it to be 128
If I reduce the number of initial subscribers from 32 the processed count increases.
e.g. if I reduce the count from 32 to 16 in the first loop I get a count of 112.
if I reduce it to 8 I get 120
I am aiming for a system where as the number of tasks being executed is increased so too are the number of subscribers available to process them.
I have created long chains of classes, with each link(class) knowing only the next and previous link.
I see great advantage over Arrays when the index is not important.
public class ChainLink
{
public ChainLink previousLink, nextLink;
// Accessors here
}
Questions :
What is this technique actually called? (I don't know what to search)
Is there a .Net class that does the same thing?
Is there a noticeable performance impact vs. Array or List?
Example of the Accessors I use
Assign to the chain :
public ChainLink NextLink {
get{ return _nextLink;}
set {
_nextLink = value;
if (value != null)
value._previousLink = this;
}
}
public void InsertNext (ChainLink link)
{
link.NextLink = _nextLink;
link.PreviousLink = this;
}
Shortening the chain :
If I un-assign the next link of a chain, leaving the remaining links un-referenced by the main program, the garbage collector will dispose of the data for me.
Testing circular referencing :
public bool IsCircular ()
{
ChainLink link = this;
while (link != null) {
link = link._nextLink;
if (link == this)
return true;
}
return false;
}
Offset index :
public ChainLink this [int offset] {
get {
if (offset > 0 && _nextLink != null)
return _nextLink [offset - 1];
if (offset < 0 && _previousLink != null)
return _previousLink [offset + 1];
return this;
}
}
1) This structure is called a double linked list
2) This implementation exists in C# through LinkedList
3) There is a lot of articles on this topic :
here or
this SO post
Others have answered the question about what this is called and the equivalent .Net class (a LinkedList), but I thought I'd quickly look at the speed of your ChainLink vs an array and a List.
I added a method Foo() to your ChainLink class so that there was something to access for each object instance:
public class ChainLink
{
public ChainLink previousLink, nextLink;
// Accessors here
public void Foo()
{ }
}
The first method creates an array and then times how long it takes to access each item in the array:
private void TestArray()
{
// Setup the Array
ChainLink[] Test = new ChainLink[1000000];
for (int i = 0; i < 1000000; i++)
Test[i] = new ChainLink();
// Use a Stopwatch to measure time
Stopwatch SW;
SW = new Stopwatch();
SW.Start();
// Go through items in the array
for (int i = 0; i < Test.Length; i++)
Test[i].Foo();
// Stop timer and report results
SW.Stop();
Console.WriteLine(SW.Elapsed);
}
Next, I created a method to use a List<T> and time how long it takes to access each item in it:
private void TestList()
{
// Setup the list
List<ChainLink> Test = new List<ChainLink>();
for (int i = 0; i < 1000000; i++)
Test.Add(new ChainLink());
// Use a Stopwatch to measure time
Stopwatch SW;
SW = new Stopwatch();
SW.Start();
// Go through items in the list
for (int i = 0; i < Test.Count; i++)
Test[i].Foo();
// Stop timer and report results
SW.Stop();
Console.WriteLine(SW.Elapsed);
}
Finally, I created a method to use your ChainLink and move through the next item until there are no more:
private void TestChainLink()
{
// Setup the linked list
ChainLink Test = new ChainLink();
for (int i = 0; i < 1000000; i++)
{
Test.nextLink = new ChainLink();
Test = Test.nextLink;
}
// Use a Stopwatch to measure time
Stopwatch SW;
SW = new Stopwatch();
SW.Start();
// Go through items in the linked list
while (Test != null)
{
Test.Foo();
Test = Test.nextLink;
}
// Stop timer and report results
SW.Stop();
Console.WriteLine(SW.Elapsed);
}
Running each of these yields some revealing results:
TestArray(): 00:00:00.0058576
TestList(): 00:00:00.0103650
TestChainLink(): 00:00:00.0000014
Multiple iterations reveal similar figures.
In conclusion, your ChainLink is about 4,100 times faster than an array, and about 7,400 times faster than a List.
i start 4 threads in a loop. each thread gets a reference to an array element to write the result.
But on the line where i create each thread, i get a System.IndexOutOfRangeException. I'm amazed that the index "i" is going out of range.
here is an example:
void ThreadsStarter()
{
double[] data = new double[4];
for(int i = 0; i < 4; i++)
{
Thread my_thread = new Thread(() => Work(data[i]));
my_thread.Start();
}
}
void Work(double data)
{
}
Why this is happening?
This is a common error: i gets evaluated when threads starts, which happens after the loop has ended. Make a temp, assign i to it, and use temp instead of i in your lambda to fix the issue:
void ThreadsStarter()
{
double[] data = new double[4];
for(int i = 0; i < 4; i++)
{
var temp = i;
Thread my_thread = new Thread(() => Work(ref data[temp]));
my_thread.Start();
}
}
void Work(ref double data)
{
}
I want to know why I am getting error if I write the below statement although I have mention at class level what is T
IList<T> targetObjectsCollection = new List<T>();
for (int counter = 0; counter < dataTransferObjects.Count; counter++)
{
targetObjectsCollection.Add(MappSharePointDAOToDTO(sharePointDaos[counter], dataTransferObjects[counter]));
}
and when I changed it to the following statement error has gone??
IList<IMapperMarker> targetObjectsCollection = new List<IMapperMarker>();
for (int counter = 0; counter < dataTransferObjects.Count; counter++)
{
targetObjectsCollection.Add(MappSharePointDAOToDTO(sharePointDaos[counter], dataTransferObjects[counter]));
}
can any body describe.
You do not seem to have defined T. It's a placeholder. It requires definition.
This code might work if it were used in a context where T had a definition. For instance,
private IList<T> AddDataTransferObjects(IList<T> dataTransferObjects)
: where T : IMapperMarker
{
IList<T> targetObjectsCollection = new List<T>();
for (int counter = 0; counter < dataTransferObjects.Count; counter++)
{
targetObjectsCollection.Add(MappSharePointDAOToDTO(sharePointDaos[counter], dataTransferObjects[counter]));
}
return targetObjectsCollection;
}
If you called that like this as follows:
IList<IMapperMarker> dtoList = Something();
var list = AddDataTransferObjects(dtoList);
In this case, the inner T would be bound to the type IMapperMarker.
I'm testing a self written element generator (ICollection<string>) and compare the calculated count to the actual count to get an idea if there's an error or not in my algorithm.
As this generator can generate lots of elements on demand I'm looking in Partitioner<string> and I have implemented a basic one which seems to also produce valid enumerators which together give the same amount of strings as calculated.
Now I want to test how this behaves if run parallel (again first testing for correct count):
MyGenerator generator = new MyGenerator();
MyPartitioner partitioner = new MyPartitioner(generator);
int isCount = partitioner.AsParallel().Count();
int shouldCount = generator.Count;
bool same = isCount == shouldCount; // false
I don't get why this count is not equal! What is the ParallelQuery<string> doing?
generator.Count() == generator.Count // true
partitioner.GetPartitions(xyz).Select(enumerator =>
{
int count = 0;
while (enumerator.MoveNext())
{
count++;
}
return count;
}).Sum() == generator.Count // true
So, I'm currently not seeing an error in my code. Next I tried to manualy count that ParallelQuery<string>:
int count = 0;
partitioner.AsParallel().ForAll(e => Interlocked.Increment(ref count));
count == generator.Count // true
Summed up: Everyone counts my enumerable correct, ParallelQuery.ForAll enumerates exactly generator.Count elements. But what does ParallelQuery.Count()?
If the correct count is something about 10k, ParallelQuery sees 40k.
internal sealed class PartialWordEnumerator : IEnumerator<string>
{
private object sync = new object();
private readonly IEnumerable<char> characters;
private readonly char[] limit;
private char[] buffer;
private IEnumerator<char>[] enumerators;
private int position = 0;
internal PartialWordEnumerator(IEnumerable<char> characters, char[] state, char[] limit)
{
this.characters = new List<char>(characters);
this.buffer = (char[])state.Clone();
if (limit != null)
{
this.limit = (char[])limit.Clone();
}
this.enumerators = new IEnumerator<char>[this.buffer.Length];
for (int i = 0; i < this.buffer.Length; i++)
{
this.enumerators[i] = SkipTo(state[i]);
}
}
private IEnumerator<char> SkipTo(char c)
{
IEnumerator<char> first = this.characters.GetEnumerator();
IEnumerator<char> second = this.characters.GetEnumerator();
while (second.MoveNext())
{
if (second.Current == c)
{
return first;
}
first.MoveNext();
}
throw new InvalidOperationException();
}
private bool ReachedLimit
{
get
{
if (this.limit == null)
{
return false;
}
for (int i = 0; i < this.buffer.Length; i++)
{
if (this.buffer[i] != this.limit[i])
{
return false;
}
}
return true;
}
}
public string Current
{
get
{
if (this.buffer == null)
{
throw new ObjectDisposedException(typeof(PartialWordEnumerator).FullName);
}
return new string(this.buffer);
}
}
object IEnumerator.Current
{
get { return this.Current; }
}
public bool MoveNext()
{
lock (this.sync)
{
if (this.position == this.buffer.Length)
{
this.position--;
}
if (this.position == -1)
{
return false;
}
IEnumerator<char> enumerator = this.enumerators[this.position];
if (enumerator.MoveNext())
{
this.buffer[this.position] = enumerator.Current;
this.position++;
if (this.position == this.buffer.Length)
{
return !this.ReachedLimit;
}
else
{
return this.MoveNext();
}
}
else
{
this.enumerators[this.position] = this.characters.GetEnumerator();
this.position--;
return this.MoveNext();
}
}
}
public void Dispose()
{
this.position = -1;
this.buffer = null;
}
public void Reset()
{
throw new NotSupportedException();
}
}
public override IList<IEnumerator<string>> GetPartitions(int partitionCount)
{
IEnumerator<string>[] enumerators = new IEnumerator<string>[partitionCount];
List<char> characters = new List<char>(this.generator.Characters);
int length = this.generator.Length;
int characterCount = this.generator.Characters.Count;
int steps = Math.Min(characterCount, partitionCount);
int skip = characterCount / steps;
for (int i = 0; i < steps; i++)
{
char c = characters[i * skip];
char[] state = new string(c, length).ToCharArray();
char[] limit = null;
if ((i + 1) * skip < characterCount)
{
c = characters[(i + 1) * skip];
limit = new string(c, length).ToCharArray();
}
if (i == steps - 1)
{
limit = null;
}
enumerators[i] = new PartialWordEnumerator(characters, state, limit);
}
for (int i = steps; i < partitionCount; i++)
{
enumerators[i] = Enumerable.Empty<string>().GetEnumerator();
}
return enumerators;
}
EDIT: I believe I have found the solution. According to the documentation on IEnumerable.MoveNext (emphasis mine):
If MoveNext passes the end of the collection, the enumerator is
positioned after the last element in the collection and MoveNext
returns false. When the enumerator is at this position, subsequent
calls to MoveNext also return false until Reset is called.
According to the following logic:
private bool ReachedLimit
{
get
{
if (this.limit == null)
{
return false;
}
for (int i = 0; i < this.buffer.Length; i++)
{
if (this.buffer[i] != this.limit[i])
{
return false;
}
}
return true;
}
}
The call to MoveNext() will return false only one time - when the buffer is exactly equal to the limit. Once you have passed the limit, the return value from ReachedLimit will start to become false again, making return !this.ReachedLimit return true, so the enumerator will continue past the end of the limit all the way until it runs out of characters to enumerate. Apparently, in the implementation of ParallelQuery.Count(), MoveNext() is called multiple times when it has reached the end, and since it starts to return a true value again, the enumerator happily continues returning more elements (this is not the case in your custom code that walks the enumerator manually, and apparently also is not the case for the ForAll call, so they "accidentally" return the correct results).
The simplest fix to this is to remember the return value from MoveNext() once it becomes false:
private bool _canMoveNext = true;
public bool MoveNext()
{
if (!_canMoveNext) return false;
...
if (this.position == this.buffer.Length)
{
if (this.ReachedLimit) _canMoveNext = false;
...
}
Now once it begins returning false, it will return false for every future call and this returns the correct result from AsParallel().Count(). Hope this helps!
The documentation on Partitioner notes (emphasis mine):
The static methods on Partitioner are all thread-safe and may
be used concurrently from multiple threads. However, while a created
partitioner is in use, the underlying data source should not be
modified, whether from the same thread that is using a partitioner or
from a separate thread.
From what I can understand of the code you have given, it would seem that ParallelQuery.Count() is most likely to have thread-safety issues because it may possibly be iterating multiple enumerators at the same time, whereas all the other solutions would require the enumerators to be run synchronized. Without seeing the code you are using for MyGenerator and MyPartitioner is it difficult to determine if thread-safety issues could be the culprit.
To demonstrate, I have written a simple enumerator that returns the first hundred numbers as strings. Also, I have a partitioner, that distributes the elements in the underlying enumerator over a collection of numPartitions separate lists. Using all the methods you described above on our 12-core server (when I output numPartitions, it uses 12 by default on this machine), I get the expected result of 100 (this is LINQPad-ready code):
void Main()
{
var partitioner = new SimplePartitioner(GetEnumerator());
GetEnumerator().Count().Dump();
partitioner.GetPartitions(10).Select(enumerator =>
{
int count = 0;
while (enumerator.MoveNext())
{
count++;
}
return count;
}).Sum().Dump();
var theCount = 0;
partitioner.AsParallel().ForAll(e => Interlocked.Increment(ref theCount));
theCount.Dump();
partitioner.AsParallel().Count().Dump();
}
// Define other methods and classes here
public IEnumerable<string> GetEnumerator()
{
for (var i = 1; i <= 100; i++)
yield return i.ToString();
}
public class SimplePartitioner : Partitioner<string>
{
private IEnumerable<string> input;
public SimplePartitioner(IEnumerable<string> input)
{
this.input = input;
}
public override IList<IEnumerator<string>> GetPartitions(int numPartitions)
{
var list = new List<string>[numPartitions];
for (var i = 0; i < numPartitions; i++)
list[i] = new List<string>();
var index = 0;
foreach (var s in input)
list[(index = (index + 1) % numPartitions)].Add(s);
IList<IEnumerator<string>> result = new List<IEnumerator<string>>();
foreach (var l in list)
result.Add(l.GetEnumerator());
return result;
}
}
Output:
100
100
100
100
This clearly works. Without more information it is impossible to tell you what is not working in your particular implementation.