ThreadPool behaves different for debug mode and runtime - c#

I want to use ThreadPool to complete long running jobs in less time. My methods
does more jobs of course but I prepared a simple example for you to understand
my situation. If I run this application it throws ArgumentOutOfRangeException on the commented line. Also it shows that i is equal to 10. How can it enter the for loop if it is 10?
If I don't run the application and debug this code it does not throw exception and works fine.
public void Test()
{
List<int> list1 = new List<int>();
List<int> list2 = new List<int>();
for (int i = 0; i < 10; i++) list1.Add(i);
for (int i = 0; i < 10; i++) list2.Add(i);
int toProcess = list1.Count;
using (ManualResetEvent resetEvent = new ManualResetEvent(false))
{
for (int i = 0; i < list1.Count; i++)
{
ThreadPool.QueueUserWorkItem(
new WaitCallback(delegate(object state)
{
// ArgumentOutOfRangeException with i=10
Sum(list1[i], list2[i]);
if (Interlocked.Decrement(ref toProcess) == 0)
resetEvent.Set();
}), null);
}
resetEvent.WaitOne();
}
MessageBox.Show("Done");
}
private void Sum(int p, int p2)
{
int sum = p + p2;
}
What is the problem here?

The problem is that i==10, but your lists have 10 items (i.e. a maximum index of 9).
This is because you have a race condition over a captured variable that is being changed before your delegate runs. Will the next iteration of the loop increment the value before the delegate runs, or will your delegate run before the loop increments the value? It's all down to the timing of that specific run.
Your instinct is that i will have a value of 0-9. However, when the loop reaches its termination, i will have a value of 10. Because the delegate captures i, the value of i may well be used after the loop has terminated.
Change your loop as follows:
for (int i = 0; i < list1.Count; i++)
{
var idx=i;
ThreadPool.QueueUserWorkItem(
new WaitCallback(delegate(object state)
{
// ArgumentOutOfRangeException with i=10
Sum(list1[idx], list2[idx]);
if (Interlocked.Decrement(ref toProcess) == 0)
resetEvent.Set();
}), null);
}
Now your delegate is getting a "private", independent copy of i instead of referring to a single, changing value that is shared between all invocations of the delegate.
I wouldn't worry too much about the difference in behaviour between debug and non-debug modes. That's the nature of race conditions.

What is the problem here?
Closure. You're capturing the i variable which isn't doing what you expect it to do.
You'll need to create a copy inside your for loop:
var currentIndex = i:
Sum(list1[currentIndex], list2[currentIndex]);

Related

How to determine if a sequence is finite or infinite?

I have a C# exercise in which I have some trouble understanding how to complete it correctly, even after doing some research online.
So, basically, I have to write an Extension Method, called Smooth, which takes as arguments an infinite double sequence and a single integer. The method needs to produce an infinite double sequence, which is the result of the average number between the number in the same position (i), i-N and I+N. At the beginning, it should take the numbers 0 to N*2. For example, if N is 3, the result is expected to be:
avg(s0, ..., s3), avg(s0, ..., s4), avg(s0, ..., s5), avg(s0, ..., s6), avg(s1, ..., s7), avg(s2, ..., s8) and so on.
I think I have resolved this part, the problem is that I have to check if the source if finite. For example, if s = 1.0, 2.0, 3.0 it should throw an error before even start to count the first averages.
My take on the exercise is as seen below:
public static class SmoothExtensionMethod
{
public static IEnumerable<double> Smooth(this IEnumerable<double> source, int N)
{
if (source == null) throw new ArgumentNullException();
if (N < 0) throw new ArgumentOutOfRangeException();
for (int i = 0; i < source.Count(); i++)
{
if (source.ElementAtOrDefault(i + 1) == 0.0)
{
// The idea is to check if the next number exists
// to evaluate if the list is finite or infinite.
throw new FiniteSourceException();
}
}
return Smooth_real(source, N);
}
private static IEnumerable<double> Smooth_real(IEnumerable<double> source, int N)
{
for (int i = 0; i < source.Count(); i++)
{
if (i <= N)
{
yield return source.Take(N + i).Average();
}
else
{
var minRange = i - N;
var maxRange = i + N;
yield return source.ToList().GetRange(minRange, maxRange).Average();
}
}
}
}
I have also made the tests:
[Test]
public void Test1_FiniteSequence_Exception()
{
var source = new List<double>() { 42.0, 49.0, 47.0, 18.0, 19.0, 28.0, 26.0 };
var N = 2;
Assert.That(() => source.Smooth(N), Throws.TypeOf<FiniteSourceException>());
}
private IEnumerable<double> Aux1_Test3_sourceGen(double[] sourceSample)
{
int index = 0;
int maxIndex = sourceSample.Count() - 1;
for (;;)
{
yield return sourceSample.ElementAt(index);
index += (index < maxIndex) ? 1 : -maxIndex;
}
}
private IEnumerable<double> Aux2_Test3_outputGen(double[] expectedSample, int N)
{
int index = 0;
int maxIndex = expectedSample.Count() - 1;
for (;;)
{
yield return expectedSample.ElementAt(index);
index += (index < maxIndex) ? 1 : -(N+1);
}
}
[Test]
public void Test3_Parametric_ExpectedResult(
[Random(2, 7, 3)] int N,
[Values(new double[] { 1.0, 2.0, 3.0, 4.0 })] double[] sourceSample,
[Values(new double[] { 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0 })] double[] expectedSample,
[Values(5, 10, 15, 20, 100, 500)] int howMany)
{
var source = Aux1_Test3_sourceGen(sourceSample).Take(1000);
var expectedOutput = Aux2_Test3_outputGen(expectedSample, N).Take(1000);
Assert.AreEqual(expectedOutput.Take(howMany), source.Smooth(N));
}
Please ignore the values on the test which I have written without any logic behind. Test1 should return an Exception, but Test3 should return the expected values.
Test1 is correctly evaluated but Test3 always fails with a FiniteSourceException instead of the expected result.
From what I have understood, I can't make a method with a infinite yield return without blocking it in any way. So I have used ".Take" to take a large number while the method can still generate a "infinite" list. But, if the parameters are in fact finite this way, the exception is always thrown (and I can't even throw exceptions in the "real" method because for some reason with a yield the throws are ignored).
How can I evaluate correctly those two tests, and determine if the sequence is in fact finite or infinite?
I'm quite sure I'm missing something here.
Sorry for the long post and the mistakes in my grammar (I have roughly translated the text from another language) and thanks in advance.
This isn’t possible. There is no way to find that out. Turing defined that.
From just reading the question itself, what you could do is make a counter variable:
Int counter = 0;
Every time the code loop runs, the counter will increase (simple example):
while (1 == 1)
{
counter = counter + 1;
Console.WriteLine(“greer”);
}
Once the counter reaches a very big number, e.g four hundred billion, you could just stop the program and say that it will repeat forever.
while (1 == 1)
{
counter = counter + 1;
Console.WriteLine(“greer”);
if (counter == very large number)
{
Console.WriteLine(“program will loop”);
}
}
After a very long number of times the code has run, the program will decide that the program will loop. Although this would be a good way to figure the problem out, there could be a program that would stop at four hundred and one billion times, and you wouldn’t be able to know. Like said, Turing said that no computer would be able to solve this problem.
Thank you for reading my small brain answer.

For loop to create Tasks going over end condition [duplicate]

This question already has answers here:
For loop goes out of range [duplicate]
(3 answers)
Closed 8 years ago.
I have a for loop to create a number of Tasks that are perameterised:
int count = 16;
List<Tuple<ulong, ulong>> brackets = GetBrackets(0L, (ulong)int.MaxValue, count);
Task[] tasks = new Task[count];
s.Reset();
s.Start();
for(int i = 0; i < count; i++)
{
tasks[i] = Task.Run(() => TestLoop(brackets[i].Item1, brackets[i].Item2));
}
Task.WaitAll(tasks);
s.Stop();
times.Add(count, s.Elapsed);
However, when this runs, an exception is thrown by the line inside the For loop, that brackets[i] does not exist, because i at that point is 16, even though the loop is set to run while i < count.
If I change the line to this:
tasks[i] = new Task(() => TestLoop(brackets[0].Item1, brackets[0].Item2));
Then no error is thrown. Also, if I walk through the loop with breakpoints, no issue is thrown.
For repro, I also include GetBrackets, which just breaks a number range into blocks:
private List<Tuple<ulong, ulong>> GetBrackets(ulong start, ulong end, int threads)
{
ulong all = (end - start);
ulong block = (ulong)(all / (ulong)threads);
List<Tuple<ulong, ulong>> brackets = new System.Collections.Generic.List<Tuple<ulong, ulong>>();
ulong last = 0;
for (int i=0; i < threads; i++)
{
brackets.Add(new Tuple<ulong, ulong>(last, (last + block - 1)));
last += block;
}
// Hack
brackets[brackets.Count - 1] = new Tuple<ulong, ulong>(
brackets[brackets.Count - 1].Item1, end);
return brackets;
}
Could anyone shed some light on this?
(This is a duplicate of similar posts, but they're often quite hard to find and the symptoms often differ slightly.)
The problem is that you're capturing the variable i in your loop:
for(int i = 0; i < count; i++)
{
tasks[i] = Task.Run(() => TestLoop(brackets[i].Item1, brackets[i].Item2));
}
You've got a single i variable, and the lambda expression captures it - so by the time your task actually starts executing the code in the lambda expression, it probably won't have the same value as it did before. You need to introduce a separate variable inside the loop, so that each iteration captures a different variable:
for (int i = 0; i < count; i++)
{
int index = i;
tasks[i] = Task.Run(() => TestLoop(brackets[index].Item1, brackets[index].Item2));
}
Alternatively, use LINQ to create the task array:
var tasks = brackets.Select(t => Task.Run(() => TestLoop(t.Item1, t.Item2));
.ToArray(); // Or ToList

Dynamically run more than one thread in c#

I have a program where I need to run a number of threads at the same time
int defaultMaxworkerThreads = 0;
int defaultmaxIOThreads = 0;
ThreadPool.GetMaxThreads(out defaultMaxworkerThreads, out defaultmaxIOThreads);
ThreadPool.SetMaxThreads(defaultMaxworkerThreads, defaultmaxIOThreads);
List<Data1> Data1 = PasswordFileHandler.ReadPasswordFile("Data1.txt");
List<Data1> Data2 = PasswordFileHandler.ReadPasswordFile("Data2.txt");
while (Data1.Count >= 0)
{
List<String> Data1Subset = (from sub in Data1 select sub).Take(NumberOfWordPrThead).ToList();
Data1 = _Data1.Except(Data1Subset ).ToList();
_NumberOfTheadsRunning++;
ThreadPool.QueueUserWorkItem(new WaitCallback(ThreadCompleted), new TaskInfo(Data1Subset , Data2 ));
//Start theads based on how many we like to start
}
How can I run more than 1 thread at a time? I would like to decide the number of threads at run-time, based on the number of cores and a config setting, but my code only seems to always run one one thread.
How should I change it to run on more than one thread?
As #TomTom pointed out, your code will work properly if you set both SetMinThreads and SetMaxThreads. In accordance with MSDN you also have to watch out not to quit the main thread too early, before the execution of the ThreadPool:
// used to simulate different work time
static Random random = new Random();
// worker
static private void callback(Object data)
{
Console.WriteLine(String.Format("Called from {0}", data));
System.Threading.Thread.Sleep(random.Next(100, 1000));
}
//
int minWorker, minIOC;
ThreadPool.GetMinThreads(out minWorker, out minIOC);
ThreadPool.SetMaxThreads(5, minIOC);
ThreadPool.SetMinThreads(3, minIOC);
for(int i = 0; i < 3; i++)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(callback), i.ToString());
}
// give the ThreadPool a change to run
Thread.Sleep(1000);
A good alternative to the standard ThreadPool is the Task Parallel Library which introduces the concept of Tasks. Using the Task object you could for example easily start multiple tasks like this:
// global variable
Random random = new Random(); // used to simulate different work time
// unit of work
private void callback(int i)
{
Console.WriteLine(String.Format("Nr. {0}", i));
System.Threading.Thread.Sleep(random.Next(100, 1000));
}
const int max = 5;
var tasks = new System.Threading.Tasks.Task[max];
for (int i = 0; i < max; i++)
{
var copy = i;
// create the tasks and init the work units
tasks[i] = new System.Threading.Tasks.Task(() => callback(copy));
}
// start the parallel execution
foreach (var task in tasks)
{
task.Start();
}
// optionally wait for all tasks to finish
System.Threading.Tasks.Task.WaitAll(tasks);
You could also start the code execution immediately using Task.Factory like this:
const int max = 5;
var tasks = new System.Threading.Tasks.Task[max];
for (int i = 0; i < max; i++)
{
var copy = i;
// start execution immediately
tasks[i] = System.Threading.Tasks.Task.Factory.StartNew(() => callback(copy));
}
System.Threading.Tasks.Task.WaitAll(tasks);
Have a look at this SO post to see the difference between ThreadPool.QueueUserWorkItem vs. Task.Factory.StartNew.

Aggregation of parallel for does not capture all iterations

I have code that works great using a simple For loop, but I'm trying to speed it up. I'm trying to adapt the code to use multiple cores and landed on Parallel For.
At a high level, I'm collecting the results from CalcRoutine for several thousand accounts and storing the results in an array with 6 elements. I'm then re-running this process 1,000 times. The order of the elements within each 6 element array is important, but the order for the final 1,000 iterations of these 6 element arrays is not important. When I run the code using a For loop, I get a 6,000 element long list. However, when I try the Parallel For version, I'm getting something closer to 600. I've confirmed that the line "return localResults" gets called 1,000 times, but for some reason not all 6 element arrays get added to the list TotalResults. Any insight as to why this isn't working would be greatly appreciated.
object locker = new object();
Parallel.For(0, iScenarios, () => new double[6], (int k, ParallelLoopState state, double[] localResults) =>
{
List<double> CalcResults = new List<double>();
for (int n = iStart; n < iEnd; n++)
{
CalcResults.AddRange(CalcRoutine(n, k));
}
localResults = this.SumOfResults(CalcResults);
return localResults;
},
(double[] localResults) =>
{
lock (locker)
{
TotalResults.AddRange(localResults);
}
});
EDIT: Here's the "non parallel" version:
for (int k = 0; k < iScenarios; k++)
{
CalcResults.Clear();
for (int n = iStart; n < iEnd; n++)
{
CalcResults.AddRange(CalcRoutine(n, k));
}
TotalResults.AddRange(SumOfResults(CalcResults));
}
The output for 1 scenario is a list of 6 doubles, 2 scenarios is a list of 12 doubles, ... n scenarios 6n doubles.
Also per one of the questions, I checked the number of times "TotalResults.AddRange..." gets called, and it's not the full 1,000 times. Why wouldn't this be called each time? With the lock, shouldn't each thread wait for this section to become available?
Check the documentation for Parallel.For
These initial states are passed to the first body invocations on each task. Then, every subsequent body invocation returns a possibly modified state value that is passed to the next body invocation. Finally, the last body invocation on each task returns a state value that is passed to the localFinally delegate
But your body delegate is ignoring the incoming value of localResults which the previous iteration within this task returned. Having the loop state being an array makes it tricky to write a correct version. This will work but looks messy:
//EDIT - Create an array of length 0 here V for input to first iteration
Parallel.For(0, iScenarios, () => new double[0],
(int k, ParallelLoopState state, double[] localResults) =>
{
List<double> CalcResults = new List<double>();
for (int n = iStart; n < iEnd; n++)
{
CalcResults.AddRange(CalcRoutine(n, k));
}
localResults = localResults.Concat(
this.SumOfResults(CalcResults)
).ToArray();
return localResults;
},
(double[] localResults) =>
{
lock (locker)
{
TotalResults.AddRange(localResults);
}
});
(Assuming Linq's enumerable extensions are in scope, for Concat)
I'd suggest using a different data structure (e.g. a List<double> rather than double[]) for the state that more naturally allows more elements to be added to it - but that would mean changing SumOfResults that you've not shown. Or just keep it all a bit more abstract:
Parallel.For(0, iScenarios, Enumerable.Empty<double>(),
(int k, ParallelLoopState state, IEnumerable<double> localResults) =>
{
List<double> CalcResults = new List<double>();
for (int n = iStart; n < iEnd; n++)
{
CalcResults.AddRange(CalcRoutine(n, k));
}
return localResults.Concat(this.SumOfResults(CalcResults));
},
(IEnumerable<double> localResults) =>
{
lock (locker)
{
TotalResults.AddRange(localResults);
}
});
(If it had worked the way you seem to have assumed, why would they have you provide two separate delegates, if all it did, on the return from body, was to immediately invoke localFinally with the return value?)
Try this:
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
class Program
{
static void Main(string[] args)
{
var iScenarios = 6;
var iStart = 0;
var iEnd = 1000;
var totalResults = new List<double>();
Parallel.For(0, iScenarios, k => {
List<double> calcResults = new List<double>();
for (int n = iStart; n < iEnd; n++)
calcResults.AddRange(CalcRoutine(n, k));
lock (totalResults)
{
totalResults.AddRange(calcResults);
}
});
}
static IEnumerable<double> CalcRoutine(int a, int b)
{
yield return 0;
}
static double[] SumOfResults(IEnumerable<double> source)
{
return source.ToArray();
}
}

Multiple threads slowing down overall dictionary access?

I am profiling a C# application and it looks like two threads each calling Dictionary<>.ContainsKey() 5000 time each on two separate but identical dictionaries (with only two items) is twice as slow as one thread calling Dictionary<>.ContainsKey() on a single dictionary 10000 times.
I am measuring the "thread time" using a tool called JetBrains dotTrace. I am explicitly using copies of the same data, so there are no synhronization primitives that I am using. Is it possible that .NET is doing some synchronization behind the scenes?
I have a dual core machine, and there are three threads running: one is blocked using Semaphore.WaitAll() while the work is done on two new threads whose priority is set to ThreadPriority.Highest.
Obvious culprits like, not actually running the code in parallel, and not using a release build has been ruled out.
EDIT:
People want the code. Alright then:
private int ReduceArrayIteration(VM vm, HeronValue[] input, int begin, int cnt)
{
if (cnt <= 1)
return cnt;
int cur = begin;
for (int i=0; i < cnt - 1; i += 2)
{
// The next two calls are effectively dominated by a call
// to dictionary ContainsKey
vm.SetVar(a, input[begin + i]);
vm.SetVar(b, input[begin + i + 1]);
input[cur++] = vm.Eval(expr);
}
if (cnt % 2 == 1)
{
input[cur++] = input[begin + cnt - 1];
}
int r = cur - begin;
Debug.Assert(r >= 1);
Debug.Assert(r < cnt);
return r;
}
// From VM
public void SetVar(string s, HeronValue o)
{
Debug.Assert(o != null);
frames.Peek().SetVar(s, o);
}
// From Frame
public bool SetVar(string s, HeronValue o)
{
for (int i = scopes.Count; i > 0; --i)
{
// Scope is a derived class of Dictionary
Scope tbl = scopes[i - 1];
if (tbl.HasName(s))
{
tbl[s] = o;
return false;
}
}
return false;
}
Now here is the thread spawning code, which might be retarded:
public static class WorkSplitter
{
static WaitHandle[] signals;
public static void ThreadStarter(Object o)
{
Task task = o as Task;
task.Run();
}
public static void SplitWork(List<Task> tasks)
{
signals = new WaitHandle[tasks.Count];
for (int i = 0; i < tasks.Count; ++i)
signals[i] = tasks[i].done;
for (int i = 0; i < tasks.Count; ++i)
{
Thread t = new Thread(ThreadStarter);
t.Priority = ThreadPriority.Highest;
t.Start(tasks[i]);
}
Semaphore.WaitAll(signals);
}
}
Even if there was any locking in Dictionary (there isn't), it could not affect your measurements since each thread is using a separate one. Running this test 10,000 times is not enough to get reliable timing data, ContainsKey() only takes 20 nanoseconds or so. You'll need at least several million times to avoid scheduling artifacts.

Categories