Using SqlBulkCopy in a multithread scenario with ThreadPool issue - c#

I'm facing a dilemma (!).
In a first scenario, I implemented a solution that replicates data from one data base to another using SQLBulkCopy synchronously and I had no problem at all.
Now, using ThreadPool, I implemented the same in a assynchronously scenario, a thread per table, and all works fine, but past some time (usualy 1 hour because the operations of copy takes about the same time), the operations send to the ThreadPool stop being executed. There is one diferent SQLBulkCopy using one diferent SQLConnection per thread.
I already see the number of free threads, and they are all free at the beginning of the invocation. I have one AutoResetEvent to wait that the threads finish their job before launching again, and a Semaphore FIFO that hold the counter of active threads.
Is there some issue that I have forgotten or that I should avaliate when using SqlBulkCopy? I appreciate some help, because my ideas are over;)
->Usage
SemaphoreFIFO waitingThreads = new SemaphoreFIFO();
AutoResetEvent autoResetEvent = new AutoResetEvent(false);
(...)
List<TableMappingHolder> list = BulkCopy.Mapping(tables);
waitingThreads.Put(list.Count, 300000);
for (int i = 0; i < list.Count; i++){
ThreadPool.QueueUserWorkItem(call =>
//Replication
(...)
waitingThreads.Get();
if (waitingThreads.Counter == 0)
autoResetEvent.Set();
);
}
bool finalized = finalized = autoResetEvent.WaitOne(300000);
(...)
//Bulk Copy
public bool SetData(SqlDataReader reader, string _destinationTableName, List<SqlBulkCopyColumnMapping> _sqlBulkCopyColumnMappings)
{
using (SqlConnection destinationConnection =
new SqlConnection(ConfigurationManager.ConnectionStrings["dconn"].ToString()))
{
destinationConnection.Open();
// Set up the bulk copy object.
// Note that the column positions in the source
// data reader match the column positions in
// the destination table so there is no need to
// map columns.
using (SqlBulkCopy bulkCopy =
new SqlBulkCopy(destinationConnection)) {
bulkCopy.BulkCopyTimeout = 300000;
bulkCopy.DestinationTableName = _destinationTableName;
// Set up the column mappings by name.
foreach (SqlBulkCopyColumnMapping columnMapping in _sqlBulkCopyColumnMappings)
bulkCopy.ColumnMappings.Add(columnMapping);
try{
// Write from the source to the destination.
bulkCopy.WriteToServer(reader);
}
catch (Exception ex){return false;}
finally
{
try{reader.Close();}
catch (Exception e){//log}
try{bulkCopy.Close();}
catch (Exception e){//log}
try{destinationConnection.Close(); }
catch (Exception e){ //log }
}
}
}
return true;
}
#
Semaphore
public sealed class SemaphoreFIFO
{
private int _counter;
private readonly LinkedList<int> waitQueue = new LinkedList<int>();
public int Counter
{
get { return _counter; }
}
private void internalNotify()
{
if (waitQueue.Count > 0 && _counter == 0)
{
Monitor.PulseAll(waitQueue);
}
}
public void Get()
{
lock (waitQueue)
{
_counter --;
internalNotify();
}
}
public bool Put(int n, int timeout)
{
if (timeout < 0 && timeout != Timeout.Infinite)
throw new ArgumentOutOfRangeException("timeout");
if (n < 0)
throw new ArgumentOutOfRangeException("n");
lock (waitQueue)
{
if (waitQueue.Count == 0 && _counter ==0)
{
_counter +=n;
internalNotify();
return true;
}
int endTime = Environment.TickCount + timeout;
LinkedListNode<int> me = waitQueue.AddLast(n);
try
{
while (true)
{
Monitor.Wait(waitQueue, timeout);
if (waitQueue.First == me && _counter ==0)
{
_counter += n;
waitQueue.RemoveFirst();
internalNotify();
return true;
}
if (timeout != Timeout.Infinite)
{
int remainingTime = endTime - Environment.TickCount;
if (remainingTime <= 0)
{
// TIMEOUT
if (waitQueue.First == me)
{
waitQueue.RemoveFirst();
internalNotify();
}
else
waitQueue.Remove(me);
return false;
}
timeout = remainingTime;
}
}
}
catch (ThreadInterruptedException e)
{
// INTERRUPT
if (waitQueue.First == me)
{
waitQueue.RemoveFirst();
internalNotify();
}
else
waitQueue.Remove(me);
throw e;
}
}
}
}

I would just go back to using SQLBulkCopy synchronously. I'm not sure what you gain by doing a bunch of bulk copies all at the same time (instead of one after the other). It may complete everything a bit faster, but I'm not even sure of that.

Related

IThreadPoolWorkItem not collected by GC

I have got an embedded debian board with mono running an .NET 4.0 application with a fixed number of threads (no actions, no tasks). Because of memory issues I used CLR-Profiler in Windows to analyse memory heap.
Following diagram shows now, that IThreadPoolWorkItems are not (at least not in generation 0) collected:
Now, I really dont have any idea where this objects are possibly used and why they arent collected.
Where could the issue be for this behaviour and where would the IThreadPoolWorkItem being used?
What can I do to find out where they are being used (I couldnt find them through searching the code or looking in CLR-Profiler yet).
Edit
...
private Dictionary<byte, Telegram> _incoming = new Dictionary<byte, Telegram>();
private Queue<byte> _serialDataQueue;
private byte[] _receiveBuffer = new byte[2048];
private Dictionary<Telegram, Telegram> _resultQueue = new Dictionary<Telegram, Telegram>();
private static Telegram _currentTelegram;
ManualResetEvent _manualReset = new ManualResetEvent(false);
...
// Called from other thread (class) to send new telegrams
public bool Send(Dictionary<byte, Telegram> telegrams, out IDictionary<Telegram, Telegram> received)
{
try
{
_manualReset.Reset();
_incoming.Clear(); // clear all prev sending telegrams
_resultQueue.Clear(); // clear the receive queue
using (token = new CancellationTokenSource())
{
foreach (KeyValuePair<byte, Telegram> pair in telegrams)
{
_incoming.Add(pair.Key, pair.Value);
}
int result = WaitHandle.WaitAny(new[] { token.Token.WaitHandle, _manualReset });
received = _resultQueue.Clone<Telegram, Telegram>();
_resultQueue.Clear();
return result == 1;
}
}
catch (Exception err)
{
...
return false;
}
}
// Communication-Thread
public void Run()
{
while(true)
{
...
GetNextTelegram(); // _currentTelegram is set there and _incoming Queue is dequeued
byte[] telegramArray = GenerateTelegram(_currentTelegram, ... );
bool telegramReceived = SendReceiveTelegram(3000, telegramArray);
...
}
}
// Helper method to send and receive telegrams
private bool SendReceiveTelegram(int timeOut, byte[] telegram)
{
// send telegram
try
{
// check if serial port is open
if (_serialPort != null && !_serialPort.IsOpen)
{
_serialPort.Open();
}
Thread.Sleep(10);
_serialPort.Write(telegram, 0, telegram.Length);
}
catch (Exception err)
{
log.ErrorFormat(err.Message, err);
return false;
}
// receive telegram
int offset = 0, bytesRead;
_serialPort.ReadTimeout = timeOut;
int bytesExpected = GetExpectedBytes(_currentTelegram);
if (bytesExpected == -1)
return false;
try
{
while (bytesExpected > 0 &&
(bytesRead = _serialPort.Read(_receiveBuffer, offset, bytesExpected)) > 0)
{
offset += bytesRead;
bytesExpected -= bytesRead;
}
for (int index = 0; index < offset; index++)
_serialDataQueue.Enqueue(_receiveBuffer[index]);
List<byte> resultList;
// looks if telegram is valid and removes bytes from _serialDataQueue
bool isValid = IsValid(_serialDataQueue, out resultList, currentTelegram);
if (isValid && resultList != null)
{
// only add to queue if its really needed!!
byte[] receiveArray = resultList.ToArray();
_resultQueue.Add((Telegram)currentTelegram.Clone(), respTelegram);
}
if (!isValid)
{
Clear();
}
return isValid;
}
catch (TimeOutException err) // Timeout exception
{
log.ErrorFormat(err.Message, err);
Clear();
return false;
} catch (Exception err)
{
log.ErrorFormat(err.Message, err);
Clear();
return false;
}
}
Thx for you help!
I found out, like spender mentioned already, the "issue" is the communication over SerialPort. I found an interesting topic here:
SerialPort has a background thread that's waiting for events (via WaitCommEvent). Whenever an event arrives, it queues a threadpool work
item that may result in a call to your event handler. Let's focus on
one of these threadpool threads. It tries to take a lock (quick
reason: this exists to synchronize event raising with closing; for
more details see the end) and once it gets the lock it checks whether
the number of bytes available to read is above the threshold. If so,
it calls your handler.
So this lock is the reason your handler won't be called in separate
threadpool threads at the same time.
Thats most certainly the reason why they arent collected immediatly. I also tried not using the blocking Read in my SendReceiveTelegram method, but using SerialDataReceivedEventHandler instead led to the same result.
So for me, I will leave things now as they are, unless you bring me a better solution, where these ThreadPoolWorkitems arent kept that long in the Queue anymore.
Thx for your help and also your negative assessment :-D

How can I get the the waiting times for access or failing in a function that is locked by threads?

I'm using a function to add some values in an Dynamic Array (I know that I could use a list but it's a requirement that I must use an Array).
Right now everything is working but I need to know when a thread fails adding a value (because it's locked and to save that time) and when it adds it (I think when it adds, I already have it as you can see in the function Add.
Insert Data:
private void button6_Click(object sender, EventArgs e)
{
showMessage(numericUpDown5.Value.ToString());
showMessage(numericUpDown6.Value.ToString());
for (int i = 0; i < int.Parse(numericUpDown6.Value.ToString()); i++)
{
ThreadStart start = new ThreadStart(insertDataSecure);
new Thread(start).Start();
}
}
private void insertDataSecure()
{
for (int i = 0; i < int.Parse(numericUpDown5.Value.ToString()); i++)
sArray.addSecure(i);
MessageBox.Show(String.Format("Finished data inserted, you can check the result in: {0}", Path.Combine(
Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location),
"times.txt")), "Result", MessageBoxButtons.OK, MessageBoxIcon.Information);
}
Function to Add:
private object padLock = new object();
public void addSecure(int value)
{
Stopwatch sw = Stopwatch.StartNew();
string values = "";
lock (padLock)
{
try
{
if (array == null)
{
this.size = 1;
Resize(this.size);
array[0] = value;
count++;
}
else
{
count++;
if (size == count)
{
size *= 2;
Resize(size);
}
array[count - 1] = value;
}
}
catch
{
throw new System.ArgumentException("It was impossible to insert, try again later.", "insert");
}
values=String.Format("Element {0}, Time taken: {1}ms", value.ToString(), sw.Elapsed.TotalMilliseconds);
sw.Stop();
saveFile(values);
}
Sorry for asking this question but I have read different articles and this is the last one that I tried to use: http://msdn.microsoft.com/en-us/library/4tssbxcw.aspx but when I tried to implement in my code finally crashed in an strange error.
I'm afraid I might not completely understand the question. It sounds like you want to know how long it takes between the time the thread starts and when it actually acquires the lock. But in that case, the thread does not actually fail to add a value; it is simply delayed some period of time.
On the other hand, you do have an exception handler, so presumably there's some scenario you expect where the Resize() method can throw an exception (but you should catch only those exceptions you expect and know you can handle…a bare catch clause is not a good idea, though the harm is mitigated somewhat by the fact that you do throw some exception the exception handler). So I can't help but wonder if that is the failure you're talking about.
That said, assuming the former interpretation is correct – that you want to time how long it takes to acquire the lock – then the following change to your code should do that:
public void addSecure(int value)
{
Stopwatch sw = Stopwatch.StartNew();
string values = "";
lock (padLock)
{
// Save the current timer value here
TimeSpan elapsedToAcquireLock = sw.Elapsed;
try
{
if (array == null)
{
this.size = 1;
Resize(this.size);
array[0] = value;
count++;
}
else
{
count++;
if (size == count)
{
size *= 2;
Resize(size);
}
array[count - 1] = value;
}
}
catch
{
throw new System.ArgumentException("It was impossible to insert, try again later.", "insert");
}
sw.Stop();
values = string.Format(
"Element {0}, Time taken: for lock acquire: {1}ms, for append operation: {2}ms",
value.ToString(),
elapsedToAcquireLock.TotalMilliseconds,
sw.Elapsed.TotalMilliseconds - elapsedToAcquireLock.TotalMilliseconds);
saveFile(values);
}
}
That will display the individual times for the sections of code: acquiring the lock, and then actually adding the value to the array (i.e. the latter not including the time taken to acquire the lock).
If that's not actually what you are trying to do, please edit your question so that it is more clear.

.NET 2.0 Processing very large lists using ThreadPool

This is further to my question here
By doing some reading .... I moved away from Semaphores to ThreadPool.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
namespace ThreadPoolTest
{
class Data
{
public int Pos { get; set; }
public int Num { get; set; }
}
class Program
{
static ManualResetEvent[] resetEvents = new ManualResetEvent[20];
static void Main(string[] args)
{
int s = 0;
for (int i = 0; i < 100000; i++)
{
resetEvents[s] = new ManualResetEvent(false);
Data d = new Data();
d.Pos = s;
d.Num = i;
ThreadPool.QueueUserWorkItem(new WaitCallback(Process), (object)d);
if (s >= 19)
{
WaitHandle.WaitAll(resetEvents);
Console.WriteLine("Press Enter to Move forward");
Console.ReadLine();
s = 0;
}
else
{
s = s + 1;
}
}
}
private static void Process(object o)
{
Data d = (Data) o;
Console.WriteLine(d.Num.ToString());
Thread.Sleep(10000);
resetEvents[d.Pos].Set();
}
}
}
This code works and I am able to process in the sets of 20. But I don't like this code because of WaitAll. So let's say I start a batch of 20, and 3 threads take longer time while 17 have finished. Even then I will keep the 17 threads as waiting because of the WaitAll.
WaitAny would have been good... but it seems rather messy that I will have to build so much of control structures like Stacks, Lists, Queues etc in order to use the pool efficiently.
The other thing I don't like is that whole global variable in the class for resetEvents. because this array has to be shared between the Process method and the main loop.
The above code works... but I need your help in improving it.
Again... I am on .NET 2.0 VS 2008. I cannot use .NET 4.0 parallel/async framework.
There are several ways you can do this. Probably the easiest, based on what you've posted above, would be:
const int MaxThreads = 4;
const int ItemsToProcess = 10000;
private Semaphore _sem = new Semaphore(MaxThreads, MaxThreads);
void DoTheWork()
{
int s = 0;
for (int i = 0; i < ItemsToProcess; ++i)
{
_sem.WaitOne();
Data d = new Data();
d.Pos = s;
d.Num = i;
ThreadPool.QueueUserWorkItem(Process, d);
++s;
if (s >= 19)
s = 0;
}
// All items have been assigned threads.
// Now, acquire the semaphore "MaxThreads" times.
// When counter reaches that number, we know all threads are done.
int semCount = 0;
while (semCount < MaxThreads)
{
_sem.WaitOne();
++semCount;
}
// All items are processed
// Clear the semaphore for next time.
_sem.Release(semCount);
}
void Process(object o)
{
// do the processing ...
// release the semaphore
_sem.Release();
}
I only used four threads in my example because that's how many cores I have. It makes little sense to be using 20 threads when only four of them can be processing at any one time. But you're free to increase the MaxThreads number if you like.
So I'm pretty sure this is all .NET 2.0.
We'll start out defining Action, because I'm so used to using it. If using this solution in 3.5+, remove that definition.
Next, we create a queue of actions based on the input.
After that we define a callback; this callback is the meat of the method.
It first grabs the next item in the queue (using a lock since the queue isn't thread safe). If it ended up having an item to grab it executes that item. Next it adds a new item to the thread pool which is "itself". This is a recursive anonymous method (you don't come across uses of that all that often). This means that when the callback is called for the first time it will execute one item, then schedule a task which will execute another item, and that item will schedule a task that executes another item, and so on. Eventually the queue will run out, and they'll stop queuing more items.
We also want the method to block until we're all done, so for that we keep track of how many of these callbacks have finished through incrementing a counter. When that counter reaches the task limit we signal the event.
Finally we start N of these callbacks in the thread pool.
public delegate void Action();
public static void Execute(IEnumerable<Action> actions, int maxConcurrentItems)
{
object key = new object();
Queue<Action> queue = new Queue<Action>(actions);
int count = 0;
AutoResetEvent whenDone = new AutoResetEvent(false);
WaitCallback callback = null;
callback = delegate
{
Action action = null;
lock (key)
{
if (queue.Count > 0)
action = queue.Dequeue();
}
if (action != null)
{
action();
ThreadPool.QueueUserWorkItem(callback);
}
else
{
if (Interlocked.Increment(ref count) == maxConcurrentItems)
whenDone.Set();
}
};
for (int i = 0; i < maxConcurrentItems; i++)
{
ThreadPool.QueueUserWorkItem(callback);
}
whenDone.WaitOne();
}
Here's another option that doesn't use the thread pool, and just uses a fixed number of threads:
public static void Execute(IEnumerable<Action> actions, int maxConcurrentItems)
{
Thread[] threads = new Thread[maxConcurrentItems];
object key = new object();
Queue<Action> queue = new Queue<Action>(actions);
for (int i = 0; i < maxConcurrentItems; i++)
{
threads[i] = new Thread(new ThreadStart(delegate
{
Action action = null;
do
{
lock (key)
{
if (queue.Count > 0)
action = queue.Dequeue();
else
action = null;
}
if (action != null)
{
action();
}
} while (action != null);
}));
threads[i].Start();
}
for (int i = 0; i < maxConcurrentItems; i++)
{
threads[i].Join();
}
}

Thread program, 1st and last loop never has a ThreadedState other than 'Running'

I have a programming that is looping x times (10), and using a specified number of threads (2). I'm using a thread array:
Thread[] myThreadArray = new Thread[2];
My loop counter, I believe, starts the first 2 threads just fine, but when it gets to loop 3, which goes back to thread 0 (zero-based), it hangs. The weird thing is, if I throw a MessageBox.Show() in their to check the ThreadState (which shows thread 0 is still running), it will continue on through 9 of the 10 loops. But if no MessageBox.Show() is there, it hangs when starting the 3rd loop.
I'm using .NET 3.5 Framework (I noticed that .NET 4.0 utilizes something called continuations...)
Here's some code examples:
Thread[] threads = new Thread[2];
int threadCounter = 0;
for (int counter = 0; counter < 10; counter++)
{
if (chkUseThreading.Checked)
{
TestRunResult runResult = new TestRunResult(counter + 1);
TestInfo tInfo = new TestInfo(conn, comm, runResult);
if (threads[threadCounter] != null)
{
// If this is here, then it will continue looping....otherwise, it hangs on the 3rd loop
MessageBox.Show(threads[threadCounter].ThreadState.ToString());
while (threads[threadCounter].IsAlive || threads[threadCounter].ThreadState == ThreadState.Running)
Thread.Sleep(1);
threads[threadCounter] = null;
}
// ExecuteTest is a non-static method
threads[threadCounter] = new Thread(new ThreadStart(delegate { ExecuteTest(tInfo); }));
threads[threadCounter].Name = "PerformanceTest" + (counter + 1);
try
{
threads[threadCounter].Start();
if ((threadCounter + 1) == threadCount)
threadCounter = 0;
else
threadCounter++;
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
}
Application.DoEvents();
}
}
while (true)
{
int threadsFinished = 0;
for (int counter = 0; counter < threadCount; counter++)
{
if (!threads[counter].IsAlive || threads[counter].ThreadState == ThreadState.Stopped)
threadsFinished++;
}
if (threadsFinished == threadCount)
break;
else
Thread.Sleep(1);
}
Obviously the problem is something about how I'm checking to see if thread #1 or #2 is done. The IsAlive always says true, and the ThreadState always has "running" for threads loops 1 and 10.
Where am I going wrong with this?
Update, here's the ExecuteTask() method:
private void ExecuteTest(object tInfo)
{
TestInfo testInfo = tInfo as TestInfo;
Exception error = null;
DateTime endTime;
TimeSpan duration;
DateTime startTime = DateTime.Now;
try
{
if (testInfo.Connection.State != ConnectionState.Open)
{
testInfo.Connection.ConnectionString = connString;
testInfo.Connection.Open();
}
testInfo.Command.ExecuteScalar();
}
catch (Exception ex)
{
error = ex;
failedCounter++;
//if (chkCancelOnError.Checked)
// break;
}
finally
{
endTime = DateTime.Now;
duration = endTime - startTime;
RunTimes.Add(duration);
testInfo.Result.StartTime = startTime;
testInfo.Result.EndTime = endTime;
testInfo.Result.Duration = duration;
testInfo.Result.Error = error;
TestResults.Add(testInfo.Result);
// This part must be threadsafe...
if (lvResults.InvokeRequired)
{
SetTextCallback d = new SetTextCallback(ExecuteTest);
this.Invoke(d, new object[] { tInfo });
}
else
{
lvResults.Items.Add(testInfo.Result.ConvertToListViewItem());
#region Update Results - This wouldn't work in it's own method in the threaded version
const string msPrefix = "ms";
// ShortestRun
TimeSpan shortest = GetShortestRun(RunTimes);
tbShortestRun.Text = shortest.TotalMilliseconds + msPrefix;
// AverageRun
TimeSpan average = GetAverageRun(RunTimes);
tbAverageRun.Text = average.TotalMilliseconds + msPrefix;
// MeanRun
TimeSpan mean = GetMeanRun(RunTimes);
tbMeanRun.Text = mean.TotalMilliseconds + msPrefix;
// LongestRun
TimeSpan longest = GetLongestRun(RunTimes);
tbLongestRun.Text = longest.TotalMilliseconds + msPrefix;
// ErrorCount
int errorCount = GetErrorCount(TestResults);
tbErrorCount.Text = errorCount.ToString();
#endregion
}
testInfo.Command.Dispose();
Application.DoEvents();
}
}
Can you post a snippet of run ()? Doesn't Thread.currentThread().notifyAll() help? May be each thread is waiting for other thread to do something resulting in a deadlock?

What does ParallelQuerys Count count?

I'm testing a self written element generator (ICollection<string>) and compare the calculated count to the actual count to get an idea if there's an error or not in my algorithm.
As this generator can generate lots of elements on demand I'm looking in Partitioner<string> and I have implemented a basic one which seems to also produce valid enumerators which together give the same amount of strings as calculated.
Now I want to test how this behaves if run parallel (again first testing for correct count):
MyGenerator generator = new MyGenerator();
MyPartitioner partitioner = new MyPartitioner(generator);
int isCount = partitioner.AsParallel().Count();
int shouldCount = generator.Count;
bool same = isCount == shouldCount; // false
I don't get why this count is not equal! What is the ParallelQuery<string> doing?
generator.Count() == generator.Count // true
partitioner.GetPartitions(xyz).Select(enumerator =>
{
int count = 0;
while (enumerator.MoveNext())
{
count++;
}
return count;
}).Sum() == generator.Count // true
So, I'm currently not seeing an error in my code. Next I tried to manualy count that ParallelQuery<string>:
int count = 0;
partitioner.AsParallel().ForAll(e => Interlocked.Increment(ref count));
count == generator.Count // true
Summed up: Everyone counts my enumerable correct, ParallelQuery.ForAll enumerates exactly generator.Count elements. But what does ParallelQuery.Count()?
If the correct count is something about 10k, ParallelQuery sees 40k.
internal sealed class PartialWordEnumerator : IEnumerator<string>
{
private object sync = new object();
private readonly IEnumerable<char> characters;
private readonly char[] limit;
private char[] buffer;
private IEnumerator<char>[] enumerators;
private int position = 0;
internal PartialWordEnumerator(IEnumerable<char> characters, char[] state, char[] limit)
{
this.characters = new List<char>(characters);
this.buffer = (char[])state.Clone();
if (limit != null)
{
this.limit = (char[])limit.Clone();
}
this.enumerators = new IEnumerator<char>[this.buffer.Length];
for (int i = 0; i < this.buffer.Length; i++)
{
this.enumerators[i] = SkipTo(state[i]);
}
}
private IEnumerator<char> SkipTo(char c)
{
IEnumerator<char> first = this.characters.GetEnumerator();
IEnumerator<char> second = this.characters.GetEnumerator();
while (second.MoveNext())
{
if (second.Current == c)
{
return first;
}
first.MoveNext();
}
throw new InvalidOperationException();
}
private bool ReachedLimit
{
get
{
if (this.limit == null)
{
return false;
}
for (int i = 0; i < this.buffer.Length; i++)
{
if (this.buffer[i] != this.limit[i])
{
return false;
}
}
return true;
}
}
public string Current
{
get
{
if (this.buffer == null)
{
throw new ObjectDisposedException(typeof(PartialWordEnumerator).FullName);
}
return new string(this.buffer);
}
}
object IEnumerator.Current
{
get { return this.Current; }
}
public bool MoveNext()
{
lock (this.sync)
{
if (this.position == this.buffer.Length)
{
this.position--;
}
if (this.position == -1)
{
return false;
}
IEnumerator<char> enumerator = this.enumerators[this.position];
if (enumerator.MoveNext())
{
this.buffer[this.position] = enumerator.Current;
this.position++;
if (this.position == this.buffer.Length)
{
return !this.ReachedLimit;
}
else
{
return this.MoveNext();
}
}
else
{
this.enumerators[this.position] = this.characters.GetEnumerator();
this.position--;
return this.MoveNext();
}
}
}
public void Dispose()
{
this.position = -1;
this.buffer = null;
}
public void Reset()
{
throw new NotSupportedException();
}
}
public override IList<IEnumerator<string>> GetPartitions(int partitionCount)
{
IEnumerator<string>[] enumerators = new IEnumerator<string>[partitionCount];
List<char> characters = new List<char>(this.generator.Characters);
int length = this.generator.Length;
int characterCount = this.generator.Characters.Count;
int steps = Math.Min(characterCount, partitionCount);
int skip = characterCount / steps;
for (int i = 0; i < steps; i++)
{
char c = characters[i * skip];
char[] state = new string(c, length).ToCharArray();
char[] limit = null;
if ((i + 1) * skip < characterCount)
{
c = characters[(i + 1) * skip];
limit = new string(c, length).ToCharArray();
}
if (i == steps - 1)
{
limit = null;
}
enumerators[i] = new PartialWordEnumerator(characters, state, limit);
}
for (int i = steps; i < partitionCount; i++)
{
enumerators[i] = Enumerable.Empty<string>().GetEnumerator();
}
return enumerators;
}
EDIT: I believe I have found the solution. According to the documentation on IEnumerable.MoveNext (emphasis mine):
If MoveNext passes the end of the collection, the enumerator is
positioned after the last element in the collection and MoveNext
returns false. When the enumerator is at this position, subsequent
calls to MoveNext also return false until Reset is called.
According to the following logic:
private bool ReachedLimit
{
get
{
if (this.limit == null)
{
return false;
}
for (int i = 0; i < this.buffer.Length; i++)
{
if (this.buffer[i] != this.limit[i])
{
return false;
}
}
return true;
}
}
The call to MoveNext() will return false only one time - when the buffer is exactly equal to the limit. Once you have passed the limit, the return value from ReachedLimit will start to become false again, making return !this.ReachedLimit return true, so the enumerator will continue past the end of the limit all the way until it runs out of characters to enumerate. Apparently, in the implementation of ParallelQuery.Count(), MoveNext() is called multiple times when it has reached the end, and since it starts to return a true value again, the enumerator happily continues returning more elements (this is not the case in your custom code that walks the enumerator manually, and apparently also is not the case for the ForAll call, so they "accidentally" return the correct results).
The simplest fix to this is to remember the return value from MoveNext() once it becomes false:
private bool _canMoveNext = true;
public bool MoveNext()
{
if (!_canMoveNext) return false;
...
if (this.position == this.buffer.Length)
{
if (this.ReachedLimit) _canMoveNext = false;
...
}
Now once it begins returning false, it will return false for every future call and this returns the correct result from AsParallel().Count(). Hope this helps!
The documentation on Partitioner notes (emphasis mine):
The static methods on Partitioner are all thread-safe and may
be used concurrently from multiple threads. However, while a created
partitioner is in use, the underlying data source should not be
modified, whether from the same thread that is using a partitioner or
from a separate thread.
From what I can understand of the code you have given, it would seem that ParallelQuery.Count() is most likely to have thread-safety issues because it may possibly be iterating multiple enumerators at the same time, whereas all the other solutions would require the enumerators to be run synchronized. Without seeing the code you are using for MyGenerator and MyPartitioner is it difficult to determine if thread-safety issues could be the culprit.
To demonstrate, I have written a simple enumerator that returns the first hundred numbers as strings. Also, I have a partitioner, that distributes the elements in the underlying enumerator over a collection of numPartitions separate lists. Using all the methods you described above on our 12-core server (when I output numPartitions, it uses 12 by default on this machine), I get the expected result of 100 (this is LINQPad-ready code):
void Main()
{
var partitioner = new SimplePartitioner(GetEnumerator());
GetEnumerator().Count().Dump();
partitioner.GetPartitions(10).Select(enumerator =>
{
int count = 0;
while (enumerator.MoveNext())
{
count++;
}
return count;
}).Sum().Dump();
var theCount = 0;
partitioner.AsParallel().ForAll(e => Interlocked.Increment(ref theCount));
theCount.Dump();
partitioner.AsParallel().Count().Dump();
}
// Define other methods and classes here
public IEnumerable<string> GetEnumerator()
{
for (var i = 1; i <= 100; i++)
yield return i.ToString();
}
public class SimplePartitioner : Partitioner<string>
{
private IEnumerable<string> input;
public SimplePartitioner(IEnumerable<string> input)
{
this.input = input;
}
public override IList<IEnumerator<string>> GetPartitions(int numPartitions)
{
var list = new List<string>[numPartitions];
for (var i = 0; i < numPartitions; i++)
list[i] = new List<string>();
var index = 0;
foreach (var s in input)
list[(index = (index + 1) % numPartitions)].Add(s);
IList<IEnumerator<string>> result = new List<IEnumerator<string>>();
foreach (var l in list)
result.Add(l.GetEnumerator());
return result;
}
}
Output:
100
100
100
100
This clearly works. Without more information it is impossible to tell you what is not working in your particular implementation.

Categories