I have a program where I need to run a number of threads at the same time
int defaultMaxworkerThreads = 0;
int defaultmaxIOThreads = 0;
ThreadPool.GetMaxThreads(out defaultMaxworkerThreads, out defaultmaxIOThreads);
ThreadPool.SetMaxThreads(defaultMaxworkerThreads, defaultmaxIOThreads);
List<Data1> Data1 = PasswordFileHandler.ReadPasswordFile("Data1.txt");
List<Data1> Data2 = PasswordFileHandler.ReadPasswordFile("Data2.txt");
while (Data1.Count >= 0)
{
List<String> Data1Subset = (from sub in Data1 select sub).Take(NumberOfWordPrThead).ToList();
Data1 = _Data1.Except(Data1Subset ).ToList();
_NumberOfTheadsRunning++;
ThreadPool.QueueUserWorkItem(new WaitCallback(ThreadCompleted), new TaskInfo(Data1Subset , Data2 ));
//Start theads based on how many we like to start
}
How can I run more than 1 thread at a time? I would like to decide the number of threads at run-time, based on the number of cores and a config setting, but my code only seems to always run one one thread.
How should I change it to run on more than one thread?
As #TomTom pointed out, your code will work properly if you set both SetMinThreads and SetMaxThreads. In accordance with MSDN you also have to watch out not to quit the main thread too early, before the execution of the ThreadPool:
// used to simulate different work time
static Random random = new Random();
// worker
static private void callback(Object data)
{
Console.WriteLine(String.Format("Called from {0}", data));
System.Threading.Thread.Sleep(random.Next(100, 1000));
}
//
int minWorker, minIOC;
ThreadPool.GetMinThreads(out minWorker, out minIOC);
ThreadPool.SetMaxThreads(5, minIOC);
ThreadPool.SetMinThreads(3, minIOC);
for(int i = 0; i < 3; i++)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(callback), i.ToString());
}
// give the ThreadPool a change to run
Thread.Sleep(1000);
A good alternative to the standard ThreadPool is the Task Parallel Library which introduces the concept of Tasks. Using the Task object you could for example easily start multiple tasks like this:
// global variable
Random random = new Random(); // used to simulate different work time
// unit of work
private void callback(int i)
{
Console.WriteLine(String.Format("Nr. {0}", i));
System.Threading.Thread.Sleep(random.Next(100, 1000));
}
const int max = 5;
var tasks = new System.Threading.Tasks.Task[max];
for (int i = 0; i < max; i++)
{
var copy = i;
// create the tasks and init the work units
tasks[i] = new System.Threading.Tasks.Task(() => callback(copy));
}
// start the parallel execution
foreach (var task in tasks)
{
task.Start();
}
// optionally wait for all tasks to finish
System.Threading.Tasks.Task.WaitAll(tasks);
You could also start the code execution immediately using Task.Factory like this:
const int max = 5;
var tasks = new System.Threading.Tasks.Task[max];
for (int i = 0; i < max; i++)
{
var copy = i;
// start execution immediately
tasks[i] = System.Threading.Tasks.Task.Factory.StartNew(() => callback(copy));
}
System.Threading.Tasks.Task.WaitAll(tasks);
Have a look at this SO post to see the difference between ThreadPool.QueueUserWorkItem vs. Task.Factory.StartNew.
Related
I want to use ThreadPool to complete long running jobs in less time. My methods
does more jobs of course but I prepared a simple example for you to understand
my situation. If I run this application it throws ArgumentOutOfRangeException on the commented line. Also it shows that i is equal to 10. How can it enter the for loop if it is 10?
If I don't run the application and debug this code it does not throw exception and works fine.
public void Test()
{
List<int> list1 = new List<int>();
List<int> list2 = new List<int>();
for (int i = 0; i < 10; i++) list1.Add(i);
for (int i = 0; i < 10; i++) list2.Add(i);
int toProcess = list1.Count;
using (ManualResetEvent resetEvent = new ManualResetEvent(false))
{
for (int i = 0; i < list1.Count; i++)
{
ThreadPool.QueueUserWorkItem(
new WaitCallback(delegate(object state)
{
// ArgumentOutOfRangeException with i=10
Sum(list1[i], list2[i]);
if (Interlocked.Decrement(ref toProcess) == 0)
resetEvent.Set();
}), null);
}
resetEvent.WaitOne();
}
MessageBox.Show("Done");
}
private void Sum(int p, int p2)
{
int sum = p + p2;
}
What is the problem here?
The problem is that i==10, but your lists have 10 items (i.e. a maximum index of 9).
This is because you have a race condition over a captured variable that is being changed before your delegate runs. Will the next iteration of the loop increment the value before the delegate runs, or will your delegate run before the loop increments the value? It's all down to the timing of that specific run.
Your instinct is that i will have a value of 0-9. However, when the loop reaches its termination, i will have a value of 10. Because the delegate captures i, the value of i may well be used after the loop has terminated.
Change your loop as follows:
for (int i = 0; i < list1.Count; i++)
{
var idx=i;
ThreadPool.QueueUserWorkItem(
new WaitCallback(delegate(object state)
{
// ArgumentOutOfRangeException with i=10
Sum(list1[idx], list2[idx]);
if (Interlocked.Decrement(ref toProcess) == 0)
resetEvent.Set();
}), null);
}
Now your delegate is getting a "private", independent copy of i instead of referring to a single, changing value that is shared between all invocations of the delegate.
I wouldn't worry too much about the difference in behaviour between debug and non-debug modes. That's the nature of race conditions.
What is the problem here?
Closure. You're capturing the i variable which isn't doing what you expect it to do.
You'll need to create a copy inside your for loop:
var currentIndex = i:
Sum(list1[currentIndex], list2[currentIndex]);
This question already has answers here:
For loop goes out of range [duplicate]
(3 answers)
Closed 8 years ago.
I have a for loop to create a number of Tasks that are perameterised:
int count = 16;
List<Tuple<ulong, ulong>> brackets = GetBrackets(0L, (ulong)int.MaxValue, count);
Task[] tasks = new Task[count];
s.Reset();
s.Start();
for(int i = 0; i < count; i++)
{
tasks[i] = Task.Run(() => TestLoop(brackets[i].Item1, brackets[i].Item2));
}
Task.WaitAll(tasks);
s.Stop();
times.Add(count, s.Elapsed);
However, when this runs, an exception is thrown by the line inside the For loop, that brackets[i] does not exist, because i at that point is 16, even though the loop is set to run while i < count.
If I change the line to this:
tasks[i] = new Task(() => TestLoop(brackets[0].Item1, brackets[0].Item2));
Then no error is thrown. Also, if I walk through the loop with breakpoints, no issue is thrown.
For repro, I also include GetBrackets, which just breaks a number range into blocks:
private List<Tuple<ulong, ulong>> GetBrackets(ulong start, ulong end, int threads)
{
ulong all = (end - start);
ulong block = (ulong)(all / (ulong)threads);
List<Tuple<ulong, ulong>> brackets = new System.Collections.Generic.List<Tuple<ulong, ulong>>();
ulong last = 0;
for (int i=0; i < threads; i++)
{
brackets.Add(new Tuple<ulong, ulong>(last, (last + block - 1)));
last += block;
}
// Hack
brackets[brackets.Count - 1] = new Tuple<ulong, ulong>(
brackets[brackets.Count - 1].Item1, end);
return brackets;
}
Could anyone shed some light on this?
(This is a duplicate of similar posts, but they're often quite hard to find and the symptoms often differ slightly.)
The problem is that you're capturing the variable i in your loop:
for(int i = 0; i < count; i++)
{
tasks[i] = Task.Run(() => TestLoop(brackets[i].Item1, brackets[i].Item2));
}
You've got a single i variable, and the lambda expression captures it - so by the time your task actually starts executing the code in the lambda expression, it probably won't have the same value as it did before. You need to introduce a separate variable inside the loop, so that each iteration captures a different variable:
for (int i = 0; i < count; i++)
{
int index = i;
tasks[i] = Task.Run(() => TestLoop(brackets[index].Item1, brackets[index].Item2));
}
Alternatively, use LINQ to create the task array:
var tasks = brackets.Select(t => Task.Run(() => TestLoop(t.Item1, t.Item2));
.ToArray(); // Or ToList
I spent the last few days on creating a parallel version of a code (college work), but I came to a dead end (at least for me): The parallel version is nearly as twice slower than the sequential one, and I have no clue on why. Here is the code:
Variables.GetMatrix();
int ThreadNumber = Environment.ProcessorCount/2;
int SS = Variables.PopSize / ThreadNumber;
//GeneticAlgorithm GA = new GeneticAlgorithm();
Stopwatch stopwatch = new Stopwatch(), st = new Stopwatch(), st1 = new Stopwatch();
List<Thread> ThreadList = new List<Thread>();
//List<Task> TaskList = new List<Task>();
GeneticAlgorithm[] SubPop = new GeneticAlgorithm[ThreadNumber];
Thread t;
//Task t;
ThreadVariables Instance = new ThreadVariables();
stopwatch.Start();
st.Start();
PopSettings();
InitialPopulation();
st.Stop();
//Lots of attributions...
int SPos = 0, EPos = SS;
for (int i = 0; i < ThreadNumber; i++)
{
int temp = i, StartPos = SPos, EndPos = EPos;
t = new Thread(() =>
{
SubPop[temp] = new GeneticAlgorithm(Population, NumSeq, SeqSize, MaxOffset, PopFit, Child, Instance, StartPos, EndPos);
SubPop[temp].RunGA();
SubPop[temp].ShowPopulation();
});
t.Start();
ThreadList.Add(t);
SPos = EPos;
EPos += SS;
}
foreach (Thread a in ThreadList)
a.Join();
double BestFit = SubPop[0].BestSol;
string BestAlign = SubPop[0].TV.Debug;
for (int i = 1; i < ThreadNumber; i++)
{
if (BestFit < SubPop[i].BestSol)
{
BestFit = SubPop[i].BestSol;
BestAlign = SubPop[i].TV.Debug;
Variables.ResSave = SubPop[i].TV.ResSave;
Variables.NumSeq = SubPop[i].TV.NumSeq;
}
}
Basically the code creates an array of the object type, instantiante and run the algorithm in each position of the array, and collecting the best value of the object array at the end. This type of algorithm works on a three-dimentional data array, and on the parallel version I assign each thread to process one range of the array, avoiding concurrency on data. Still, I'm getting the slow timing... Any ideas?
I'm using an Core i5, which has four cores (two + two hyperthreading), but any amount of threads greater than one I use makes the code run slower.
What I can explain of the code I'm running in parallel is:
The second method being called in the code I posted makes about 10,000 iterations, and in each iteration it calls one function. This function may or may not call others more (spread across two different objects for each thread) and make lots of calculations, it depends on a bunch of factors which are particular of the algorithm. And all these methods for one thread work in an area of a data array that isn't accessed by the other threads.
With System.Linq there is a lot to make simpler:
int ThreadNumber = Environment.ProcessorCount/2;
int SS = Variables.PopSize / ThreadNumber;
int numberOfTotalIterations = // I don't know what goes here.
var doneAlgorithms = Enumerable.Range(0, numberOfTotalIterations)
.AsParallel() // Makes the whole thing running in parallel
.WithDegreeOfParallelism(ThreadNumber) // We don't need this line if you want the system to manage the number of parallel processings.
.Select(index=> _runAlgorithmAndReturn(index,SS))
.ToArray(); // This is obsolete if you only need the collection of doneAlgorithms to determine the best one.
// If not, keep it to prevent multiple enumerations.
// So we sort algorithms by BestSol ascending and take the first one to determine the "best".
// OrderBy causes a full enumeration, hence the above mentioned obsoletion of the ToArray() statement.
GeneticAlgorithm best = doneAlgorithms.OrderBy(algo => algo.BestSol).First();
BestFit = best.Bestsol;
BestAlign = best.TV.Debug;
Variables.ResSave = best.TV.ResSave;
Variables.NumSeq = best.TV.NumSeq;
And declare a method to make it a bit more readable
/// <summary>
/// Runs a single algorithm and returns it
/// </summary>
private GeneticAlgorithm _runAlgorithmAndReturn(int index, int SS)
{
int startPos = index * SS;
int endPos = startPos + SS;
var algo = new GeneticAlgorithm(Population, NumSeq, SeqSize, MaxOffset, PopFit, Child, Instance, startPos, endPos);
algo.RunGA();
algo.ShowPopulation();
return algo;
}
There is a big overhead in creating threads.
Instead of creating new threads, use the ThreadPool, as show below:
Variables.GetMatrix();
int ThreadNumber = Environment.ProcessorCount / 2;
int SS = Variables.PopSize / ThreadNumber;
//GeneticAlgorithm GA = new GeneticAlgorithm();
Stopwatch stopwatch = new Stopwatch(), st = new Stopwatch(), st1 = new Stopwatch();
List<WaitHandle> WaitList = new List<WaitHandle>();
//List<Task> TaskList = new List<Task>();
GeneticAlgorithm[] SubPop = new GeneticAlgorithm[ThreadNumber];
//Task t;
ThreadVariables Instance = new ThreadVariables();
stopwatch.Start();
st.Start();
PopSettings();
InitialPopulation();
st.Stop();
//lots of attributions...
int SPos = 0, EPos = SS;
for (int i = 0; i < ThreadNumber; i++)
{
int temp = i, StartPos = SPos, EndPos = EPos;
ManualResetEvent wg = new ManualResetEvent(false);
WaitList.Add(wg);
ThreadPool.QueueUserWorkItem((unused) =>
{
SubPop[temp] = new GeneticAlgorithm(Population, NumSeq, SeqSize, MaxOffset, PopFit, Child, Instance, StartPos, EndPos);
SubPop[temp].RunGA();
SubPop[temp].ShowPopulation();
wg.Set();
});
SPos = EPos;
EPos += SS;
}
ManualResetEvent.WaitAll(WaitList.ToArray());
double BestFit = SubPop[0].BestSol;
string BestAlign = SubPop[0].TV.Debug;
for (int i = 1; i < ThreadNumber; i++)
{
if (BestFit < SubPop[i].BestSol)
{
BestFit = SubPop[i].BestSol;
BestAlign = SubPop[i].TV.Debug;
Variables.ResSave = SubPop[i].TV.ResSave;
Variables.NumSeq = SubPop[i].TV.NumSeq;
}
}
Note that instead of using Join to wait the thread execution, I'm using WaitHandles.
You're creating the threads yourself, so there's some extreme overhead there. Parallelise like the comments suggested. Also make sure the time a single work-unit takes is long enough. A single thread/workunit should be alive for at least ~20 ms.
Pretty basic things really. I'd suggest you really read up on how multi-threading in .NET works.
I see you don't create too many threads. But the optimal threadcount can't be determined just from the processor count. The built-in Parallel class has advanced algorithms to reduce the overall time.
Partitioning and threading are some pretty complex things that require a lot knowledge to get right, so unless you REALLY know what you're doing rely on the Parallel class to handle it for you.
WHen I run the following code:
public static double SumRootN(int root)
{
double result = 0;
for (int i = 1; i < 10000000; i++)
{
result += Math.Exp(Math.Log(i) / root);
}
return result;
}
static void Main()
{
ParallelOptions options = new ParallelOptions();
options.MaxDegreeOfParallelism = 2; // -1 is for unlimited. 1 is for sequential.
try
{
Parallel.For(
0,
9,
options,
(i) =>
{
var result = SumRootN(i);
Console.WriteLine("Thread={0}, root {0} : {1} ",Thread.CurrentThread.ManagedThreadId, i, result);
});
);
}
catch (AggregateException e)
{
Console.WriteLine("Parallel.For has thrown the following (unexpected) exception:\n{0}", e);
}
}
I see that the output is:
There are 3 thread Ids here, but I have specified that the MaxDegreeOFParallelism is only 2. So why is there 3 threads doing the work instead of 2?
Quote from http://msdn.microsoft.com/en-us/library/system.threading.tasks.paralleloptions.maxdegreeofparallelism(v=vs.110).aspx
By default, For and ForEach will utilize however many threads the underlying scheduler provides, so changing MaxDegreeOfParallelism from the default only limits how many concurrent tasks will be used.
Translation: only 2 threads will be running at any given moment, but more (or even less) than 2 may be used out of the thread pool. You can test this with another writeline at the start of the task, you'll see that no 3 threads will enter concurrently.
I am profiling a C# application and it looks like two threads each calling Dictionary<>.ContainsKey() 5000 time each on two separate but identical dictionaries (with only two items) is twice as slow as one thread calling Dictionary<>.ContainsKey() on a single dictionary 10000 times.
I am measuring the "thread time" using a tool called JetBrains dotTrace. I am explicitly using copies of the same data, so there are no synhronization primitives that I am using. Is it possible that .NET is doing some synchronization behind the scenes?
I have a dual core machine, and there are three threads running: one is blocked using Semaphore.WaitAll() while the work is done on two new threads whose priority is set to ThreadPriority.Highest.
Obvious culprits like, not actually running the code in parallel, and not using a release build has been ruled out.
EDIT:
People want the code. Alright then:
private int ReduceArrayIteration(VM vm, HeronValue[] input, int begin, int cnt)
{
if (cnt <= 1)
return cnt;
int cur = begin;
for (int i=0; i < cnt - 1; i += 2)
{
// The next two calls are effectively dominated by a call
// to dictionary ContainsKey
vm.SetVar(a, input[begin + i]);
vm.SetVar(b, input[begin + i + 1]);
input[cur++] = vm.Eval(expr);
}
if (cnt % 2 == 1)
{
input[cur++] = input[begin + cnt - 1];
}
int r = cur - begin;
Debug.Assert(r >= 1);
Debug.Assert(r < cnt);
return r;
}
// From VM
public void SetVar(string s, HeronValue o)
{
Debug.Assert(o != null);
frames.Peek().SetVar(s, o);
}
// From Frame
public bool SetVar(string s, HeronValue o)
{
for (int i = scopes.Count; i > 0; --i)
{
// Scope is a derived class of Dictionary
Scope tbl = scopes[i - 1];
if (tbl.HasName(s))
{
tbl[s] = o;
return false;
}
}
return false;
}
Now here is the thread spawning code, which might be retarded:
public static class WorkSplitter
{
static WaitHandle[] signals;
public static void ThreadStarter(Object o)
{
Task task = o as Task;
task.Run();
}
public static void SplitWork(List<Task> tasks)
{
signals = new WaitHandle[tasks.Count];
for (int i = 0; i < tasks.Count; ++i)
signals[i] = tasks[i].done;
for (int i = 0; i < tasks.Count; ++i)
{
Thread t = new Thread(ThreadStarter);
t.Priority = ThreadPriority.Highest;
t.Start(tasks[i]);
}
Semaphore.WaitAll(signals);
}
}
Even if there was any locking in Dictionary (there isn't), it could not affect your measurements since each thread is using a separate one. Running this test 10,000 times is not enough to get reliable timing data, ContainsKey() only takes 20 nanoseconds or so. You'll need at least several million times to avoid scheduling artifacts.