.NET 2.0 Processing very large lists using ThreadPool - c#

This is further to my question here
By doing some reading .... I moved away from Semaphores to ThreadPool.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
namespace ThreadPoolTest
{
class Data
{
public int Pos { get; set; }
public int Num { get; set; }
}
class Program
{
static ManualResetEvent[] resetEvents = new ManualResetEvent[20];
static void Main(string[] args)
{
int s = 0;
for (int i = 0; i < 100000; i++)
{
resetEvents[s] = new ManualResetEvent(false);
Data d = new Data();
d.Pos = s;
d.Num = i;
ThreadPool.QueueUserWorkItem(new WaitCallback(Process), (object)d);
if (s >= 19)
{
WaitHandle.WaitAll(resetEvents);
Console.WriteLine("Press Enter to Move forward");
Console.ReadLine();
s = 0;
}
else
{
s = s + 1;
}
}
}
private static void Process(object o)
{
Data d = (Data) o;
Console.WriteLine(d.Num.ToString());
Thread.Sleep(10000);
resetEvents[d.Pos].Set();
}
}
}
This code works and I am able to process in the sets of 20. But I don't like this code because of WaitAll. So let's say I start a batch of 20, and 3 threads take longer time while 17 have finished. Even then I will keep the 17 threads as waiting because of the WaitAll.
WaitAny would have been good... but it seems rather messy that I will have to build so much of control structures like Stacks, Lists, Queues etc in order to use the pool efficiently.
The other thing I don't like is that whole global variable in the class for resetEvents. because this array has to be shared between the Process method and the main loop.
The above code works... but I need your help in improving it.
Again... I am on .NET 2.0 VS 2008. I cannot use .NET 4.0 parallel/async framework.

There are several ways you can do this. Probably the easiest, based on what you've posted above, would be:
const int MaxThreads = 4;
const int ItemsToProcess = 10000;
private Semaphore _sem = new Semaphore(MaxThreads, MaxThreads);
void DoTheWork()
{
int s = 0;
for (int i = 0; i < ItemsToProcess; ++i)
{
_sem.WaitOne();
Data d = new Data();
d.Pos = s;
d.Num = i;
ThreadPool.QueueUserWorkItem(Process, d);
++s;
if (s >= 19)
s = 0;
}
// All items have been assigned threads.
// Now, acquire the semaphore "MaxThreads" times.
// When counter reaches that number, we know all threads are done.
int semCount = 0;
while (semCount < MaxThreads)
{
_sem.WaitOne();
++semCount;
}
// All items are processed
// Clear the semaphore for next time.
_sem.Release(semCount);
}
void Process(object o)
{
// do the processing ...
// release the semaphore
_sem.Release();
}
I only used four threads in my example because that's how many cores I have. It makes little sense to be using 20 threads when only four of them can be processing at any one time. But you're free to increase the MaxThreads number if you like.

So I'm pretty sure this is all .NET 2.0.
We'll start out defining Action, because I'm so used to using it. If using this solution in 3.5+, remove that definition.
Next, we create a queue of actions based on the input.
After that we define a callback; this callback is the meat of the method.
It first grabs the next item in the queue (using a lock since the queue isn't thread safe). If it ended up having an item to grab it executes that item. Next it adds a new item to the thread pool which is "itself". This is a recursive anonymous method (you don't come across uses of that all that often). This means that when the callback is called for the first time it will execute one item, then schedule a task which will execute another item, and that item will schedule a task that executes another item, and so on. Eventually the queue will run out, and they'll stop queuing more items.
We also want the method to block until we're all done, so for that we keep track of how many of these callbacks have finished through incrementing a counter. When that counter reaches the task limit we signal the event.
Finally we start N of these callbacks in the thread pool.
public delegate void Action();
public static void Execute(IEnumerable<Action> actions, int maxConcurrentItems)
{
object key = new object();
Queue<Action> queue = new Queue<Action>(actions);
int count = 0;
AutoResetEvent whenDone = new AutoResetEvent(false);
WaitCallback callback = null;
callback = delegate
{
Action action = null;
lock (key)
{
if (queue.Count > 0)
action = queue.Dequeue();
}
if (action != null)
{
action();
ThreadPool.QueueUserWorkItem(callback);
}
else
{
if (Interlocked.Increment(ref count) == maxConcurrentItems)
whenDone.Set();
}
};
for (int i = 0; i < maxConcurrentItems; i++)
{
ThreadPool.QueueUserWorkItem(callback);
}
whenDone.WaitOne();
}
Here's another option that doesn't use the thread pool, and just uses a fixed number of threads:
public static void Execute(IEnumerable<Action> actions, int maxConcurrentItems)
{
Thread[] threads = new Thread[maxConcurrentItems];
object key = new object();
Queue<Action> queue = new Queue<Action>(actions);
for (int i = 0; i < maxConcurrentItems; i++)
{
threads[i] = new Thread(new ThreadStart(delegate
{
Action action = null;
do
{
lock (key)
{
if (queue.Count > 0)
action = queue.Dequeue();
else
action = null;
}
if (action != null)
{
action();
}
} while (action != null);
}));
threads[i].Start();
}
for (int i = 0; i < maxConcurrentItems; i++)
{
threads[i].Join();
}
}

Related

Best way to let many worker-threads wait for mainthread and vice versa

I'm looking for a fast way to let many worker threads wait for an event to continue and block the main thread until all worker threads are finished. I first used TPL or AutoResetEvent but since my calculation isn't that expensive the overhead was way too much.
I found a pretty interesting article concerning this problem and got great results (using only one worker thread) with the last synchronization solution (Interlocked.CompareExchange). But I don't know how to utilize it for a scenario where many threads wait for one main tread repeatedly.
Here is an example using single thread, CompareExchange, and Barrier:
static void Main(string[] args)
{
int cnt = 1000000;
var stopwatch = new Stopwatch();
stopwatch.Start();
for (int i = 0; i < cnt; i++) { }
Console.WriteLine($"Single thread: {stopwatch.Elapsed.TotalSeconds}s");
var run = true;
Task task;
stopwatch.Restart();
int interlock = 0;
task = Task.Run(() =>
{
while (run)
{
while (Interlocked.CompareExchange(ref interlock, 0, 1) != 1) { Thread.Sleep(0); }
interlock = 2;
}
Console.WriteLine($"CompareExchange synced: {stopwatch.Elapsed.TotalSeconds}s");
});
for (int i = 0; i < cnt; i++)
{
interlock = 1;
while (Interlocked.CompareExchange(ref interlock, 0, 2) != 2) { Thread.Sleep(0); }
}
run = false;
interlock = 1;
task.Wait();
run = true;
var barrier = new Barrier(2);
stopwatch.Restart();
task = Task.Run(() =>
{
while (run) { barrier.SignalAndWait(); }
Console.WriteLine($"Barrier synced: {stopwatch.Elapsed.TotalSeconds}s");
});
for (int i = 0; i < cnt; i++) { barrier.SignalAndWait(); }
Thread.Sleep(0);
run = false;
if (barrier.ParticipantsRemaining == 1) { barrier.SignalAndWait(); }
task.Wait();
Console.ReadKey();
}
Average results (in seconds) are:
Single thread: 0,002
CompareExchange: 0,4
Barrier: 1,7
As you can see Barriers' overhead seems to be arround 4 times higher! If someone can rebuild me the CompareExchange-scenario to work with multiple worker threads this would surely help, too!
Sure, 1 second overhead for a million calculations is pretty less! Actually it just interests me.
Edit:
System.Threading.Barrier seems to be the fastest solution for this scenario. For saving a double blocking (all workers ready for work, all workes finished) I used the following code for the best results:
while(work)
{
while (barrier.ParticipantsRemaining > 1) { Thread.Sleep(0); }
//Set work package
barrier.SignalAndWait()
}
It seems like you might want to use a Barrier to synchronise a number of workers with a main thread.
Here's a compilable example. Have a play with it, paying attention to when the output tells you that you can "Press <Return> to signal the workers to start".
using System;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
namespace Demo
{
static class Program
{
static void Main()
{
print("Main thread is starting the workers.");
int numWorkers = 10;
var barrier = new Barrier(numWorkers + 1); // Workers + main (controlling) thread.
for (int i = 0; i < numWorkers; ++i)
{
int n = i; // Prevent modified closure.
Task.Run(() => worker(barrier, n));
}
while (true)
{
print("***************** Press <RETURN> to signal the workers to start");
Console.ReadLine();
print("Main thread is signalling all the workers to start.");
// This will wait for all the workers to issue their call to
// barrier.SignalAndWait() before it returns:
barrier.SignalAndWait();
// At this point, all workers AND the main thread are at the same point.
}
}
static void worker(Barrier barrier, int workerNumber)
{
int iter = 0;
while (true)
{
print($"Worker {workerNumber} on iteration {iter} is waiting for barrier.");
// This will wait for all the other workers AND the main thread
// to issue their call to barrier.SignalAndWait() before it returns:
barrier.SignalAndWait();
// At this point, all workers AND the main thread are at the same point.
int delay = randomDelayMilliseconds();
print($"Worker {workerNumber} got barrier, now sleeping for {delay}");
Thread.Sleep(delay);
print($"Worker {workerNumber} finished work for iteration {iter}.");
}
}
static void print(string message)
{
Console.WriteLine($"[{sw.ElapsedMilliseconds:00000}] {message}");
}
static int randomDelayMilliseconds()
{
lock (rng)
{
return rng.Next(10000) + 5000;
}
}
static Random rng = new Random();
static Stopwatch sw = Stopwatch.StartNew();
}
}

Creating X amount of threads that execute a task at the same time

I am trying to dynamically create X amount of threads(specified by user) then basically have all of them execute some code at the exact same time in intervals of 1 second.
The issue I am having is that the task I am trying to complete relies on a loop to determine if the current IP is equal to the last. (It scans hosts) So since I have this loop inside, it is going off and then the other threads are not getting created, and not executing the code. I would like them to all go off at the same time, wait 1 second(using a timer or something else that doesnt lock the thread since the code it is executing has a timeout it waits for.) Can anyone help me out? Here is my current code:
int threads = Convert.ToInt32(txtThreads.Text);
List<Thread> workerThreads = new List<Thread>();
string from = txtStart.Text, to = txtEnd.Text;
uint current = from.ToUInt(), last = to.ToUInt();
ulong total = last - current;
for (int i = 0; i < threads; i++)
{
Thread thread = new Thread(() =>
{
for (int t = 0; t < Convert.ToInt32(total); t += i)
{
while (current <= last)
{
current = Convert.ToUInt32(current + t);
var ip = current.ToIPAddress();
doSomething(ip);
}
}
});
workerThreads.Add(thread);
thread.Start();
}
Don't use a lambda as the body of your thread, otherwise the i value isn't doing what you think it's doing. Instead pass the value into a method.
As for starting all of the threads at the same time do something like the following:
private object syncObj = new object();
void ThreadBody(object boxed)
{
Params params = (Params)boxed;
lock (syncObj)
{
Monitor.Wait(syncObj);
}
// do work here
}
struct Params
{
// passed values here
}
void InitializeThreads()
{
int threads = Convert.ToInt32(txtThreads.Text);
List<Thread> workerThreads = new List<Thread>();
string from = txtStart.Text, to = txtEnd.Text;
uint current = from.ToUInt(), last = to.ToUInt();
ulong total = last - current;
for (int i = 0; i < threads; i++)
{
Thread thread = new Thread(new ParameterizedThreadStart(this.ThreadBody, new Params { /* initialize values here */ }));
workerThreads.Add(thread);
thread.Start();
}
lock(syncObj)
{
Monitor.PulseAll(syncObj);
}
}
You're running into closure problems. There's another question that somewhat addresses this, here.
Basically you need to capture the value of i as you create each task. What's happening is by the time the task gets around to actually running, the value of i across all your tasks is the same -- the value at the end of the loop.

How to block new threads until all threads are created and started

I am building a small application simulating a horse race in order to gain some basic skill in working with threads.
My code contains this loop:
for (int i = 0; i < numberOfHorses; i++)
{
horsesThreads[i] = new Thread(horsesTypes[i].Race);
horsesThreads[i].Start(100);
}
In order to keep the race 'fair', I've been looking for a way to make all newly created threads wait until the rest of the new threads are set, and only then launch all of them to start running their methods (Please note that I understand that technically the threads can't be launched at the 'same time')
So basically, I am looking for something like this:
for (int i = 0; i < numberOfHorses; i++)
{
horsesThreads[i] = new Thread(horsesTypes[i].Race);
}
Monitor.LaunchThreads(horsesThreads);
Threading does not promise fairness or deterministic results, so it's not a good way to simulate a race.
Having said that, there are some sync objects that might do what you ask. I think the Barrier class (Fx 4+) is what you want.
The Barrier class is designed to support this.
Here's an example:
using System;
using System.Threading;
namespace Demo
{
class Program
{
private void run()
{
int numberOfHorses = 12;
// Use a barrier with a participant count that is one more than the
// the number of threads. The extra one is for the main thread,
// which is used to signal the start of the race.
using (Barrier barrier = new Barrier(numberOfHorses + 1))
{
var horsesThreads = new Thread[numberOfHorses];
for (int i = 0; i < numberOfHorses; i++)
{
int horseNumber = i;
horsesThreads[i] = new Thread(() => runRace(horseNumber, barrier));
horsesThreads[i].Start();
}
Console.WriteLine("Press <RETURN> to start the race!");
Console.ReadLine();
// Signals the start of the race. None of the threads that called
// SignalAndWait() will return from the call until *all* the
// participants have signalled the barrier.
barrier.SignalAndWait();
Console.WriteLine("Race started!");
Console.ReadLine();
}
}
private static void runRace(int horseNumber, Barrier barrier)
{
Console.WriteLine("Horse " + horseNumber + " is waiting to start.");
barrier.SignalAndWait();
Console.WriteLine("Horse " + horseNumber + " has started.");
}
private static void Main()
{
new Program().run();
}
}
}
[EDIT] I just noticed that Henk already mentioned Barrier, but I'll leave this answer here because it has some sample code.
I'd be looking at a ManualResetEvent as a gate; inside the Thread, decrement a counter; if it is still non-zero, wait on the gate; otherwise, open the gate. Basically:
using System;
using System.Threading;
class Program
{
static void Main()
{
ManualResetEvent gate = new ManualResetEvent(false);
int numberOfThreads = 10, pending = numberOfThreads;
Thread[] threads = new Thread[numberOfThreads];
ParameterizedThreadStart work = name =>
{
Console.WriteLine("{0} approaches the tape", name);
if (Interlocked.Decrement(ref pending) == 0)
{
Console.WriteLine("And they're off!");
gate.Set();
}
else gate.WaitOne();
Race();
Console.WriteLine("{0} crosses the line", name);
};
for (int i = 0; i < numberOfThreads; i++)
{
threads[i] = new Thread(work);
threads[i].Start(i);
}
for (int i = 0; i < numberOfThreads; i++)
{
threads[i].Join();
}
Console.WriteLine("all done");
}
static readonly Random rand = new Random();
static void Race()
{
int time;
lock (rand)
{
time = rand.Next(500,1000);
}
Thread.Sleep(time);
}
}

For loop in many threads

How can I run each call for loop in another thread, but continuation of ExternalMethod should wait to ending of last working thread from for loop (and synchronize) ?
ExternalMethod()
{
//some calculations
for (int i = 0; i < 10; i++)
{
SomeMethod(i);
}
//continuation ExternalMethod
}
One approach would be to use a ManualResetEvent.
Consider the following code (note that this should not be taken as a working example, stuck on OSX so don't have VS nor a C# compiler to hand to check this over):
static ManualResetEvent mre = new ManualResetEvent(false);
static int DoneCount = 0;
static int DoneRequired = 9;
void ExternalMethod() {
mre.Reset();
for (int i = 0; i < 10; i++) {
new Thread(new ThreadStart(ThreadVoid)).Start();
}
mre.WaitOne();
}
void ThreadVoid() {
Interlocked.Increment(ref DoneCount);
if (DoneCount == DoneRequired) {
mre.Set();
}
}
IMPORTANT - This possibly isn't the best way to do it, just an example of using ManualResetEvent, and it will suit your needs perfectly fine.
If you're on .NET 4.0 you can use a Parallel.For loop - explained here.
System.Threading.Tasks.Parallel.For(0, 10, (i) => SomeMethod(i));
One approach is to use a CountdownEvent.
ExternalMethod()
{
//some calculations
var finished = new CountdownEvent(1);
for (int i = 0; i < 10; i++)
{
int capture = i; // This is needed to capture the loop variable correctly.
finished.AddCount();
ThreadPool.QueueUserWorkItem(
(state) =>
{
try
{
SomeMethod(capture);
}
finally
{
finished.Signal();
}
}, null);
}
finished.Signal();
finished.Wait();
//continuation ExternalMethod
}
If CountdownEvent is not available then here is an alternate approach.
ExternalMethod()
{
//some calculations
var finished = new ManualResetEvent(false);
int pending = 1;
for (int i = 0; i < 10; i++)
{
int capture = i; // This is needed to capture the loop variable correctly.
Interlocked.Increment(ref pending);
ThreadPool.QueueUserWorkItem(
(state) =>
{
try
{
SomeMethod(capture);
}
finally
{
if (Interlocked.Decrement(ref pending) == 0) finished.Set();
}
}, null);
}
if (Interlocked.Decrement(ref pending) == 0) finished.Set();
finished.WaitOne();
//continuation ExternalMethod
}
Note that in both examples the for loop itself is treating as a parallel work item (it is on a separate thread from the other work items afterall) to avoid a really subtle race condition that might occur if the first work item signals the event before the next work item is queued.
For .NET 3.5, maybe something like this:
Thread[] threads = new Thread[10];
for (int x = 0; x < 10; x++)
{
threads[x] = new Thread(new ParameterizedThreadStart(ThreadFun));
threads[x].Start(x);
}
foreach (Thread thread in threads) thread.Join();
It may seem counterintuitive to use the Join() method, but since you are effectively doing a WaitAll-type pattern, it doesn't matter what order the joins are executed.

Threaded simultaneous jobs

There is a string array myDownloadList containing 100 string URIs. I want to start 5 thread jobs that will pop next URI from myDownloadList (like a stack) and do something with it (download it), until there is no URIs left on a stack (myDownloadList).
What would be the best practice to do this?
Use the ThreadPool, and just setup all of your requests. The ThreadPool will automatically schedule them appropriately.
This will get easier with .NET 4, using the Task Parallel Library. Setting up each request as a Task is very efficient and easy.
Make sure each thread locks the myDownloadList when accessing it. You could use recursion to keep getting the latest one, then when the list is 0 it can just stop the function.
See the example below.
public static List<string> MyList { get; set; }
public static object LockObject { get; set; }
static void Main(string[] args)
{
Console.Clear();
Program.LockObject = new object();
// Create the list
Program.MyList = new List<string>();
// Add 100 items to it
for (int i = 0; i < 100; i++)
{
Program.MyList.Add(string.Format("Item Number = {0}", i));
}
// Start Threads
for (int i = 0; i < 5; i++)
{
Thread thread = new Thread(new ThreadStart(Program.PopItemFromStackAndPrint));
thread.Name = string.Format("Thread # {0}", i);
thread.Start();
}
}
public static void PopItemFromStackAndPrint()
{
if (Program.MyList.Count == 0)
{
return;
}
string item = string.Empty;
lock (Program.LockObject)
{
// Get first Item
item = Program.MyList[0];
Program.MyList.RemoveAt(0);
}
Console.WriteLine("{0}:{1}", System.Threading.Thread.CurrentThread.Name, item);
// Sleep to show other processing for examples only
System.Threading.Thread.Sleep(10);
Program.PopItemFromStackAndPrint();
}

Categories