There is a string array myDownloadList containing 100 string URIs. I want to start 5 thread jobs that will pop next URI from myDownloadList (like a stack) and do something with it (download it), until there is no URIs left on a stack (myDownloadList).
What would be the best practice to do this?
Use the ThreadPool, and just setup all of your requests. The ThreadPool will automatically schedule them appropriately.
This will get easier with .NET 4, using the Task Parallel Library. Setting up each request as a Task is very efficient and easy.
Make sure each thread locks the myDownloadList when accessing it. You could use recursion to keep getting the latest one, then when the list is 0 it can just stop the function.
See the example below.
public static List<string> MyList { get; set; }
public static object LockObject { get; set; }
static void Main(string[] args)
{
Console.Clear();
Program.LockObject = new object();
// Create the list
Program.MyList = new List<string>();
// Add 100 items to it
for (int i = 0; i < 100; i++)
{
Program.MyList.Add(string.Format("Item Number = {0}", i));
}
// Start Threads
for (int i = 0; i < 5; i++)
{
Thread thread = new Thread(new ThreadStart(Program.PopItemFromStackAndPrint));
thread.Name = string.Format("Thread # {0}", i);
thread.Start();
}
}
public static void PopItemFromStackAndPrint()
{
if (Program.MyList.Count == 0)
{
return;
}
string item = string.Empty;
lock (Program.LockObject)
{
// Get first Item
item = Program.MyList[0];
Program.MyList.RemoveAt(0);
}
Console.WriteLine("{0}:{1}", System.Threading.Thread.CurrentThread.Name, item);
// Sleep to show other processing for examples only
System.Threading.Thread.Sleep(10);
Program.PopItemFromStackAndPrint();
}
Related
I have been assigned the task of converting a large list (4 million) of ids to usernames. For this purpose I've decided to delegate multiple tasks to my premium proxies.
public class ProxyWorker
{
private static int _proxyCount;
static void Run(List<long> largeList)
{
var taskList = new List<Task>();
for (int i = 0; i < _proxyCount; i++)
{
taskList.Add(Task.Factory.StartNew(() => ConvertOnProxy(i, largeList.Take(1000).ToList())));
}
Task.WaitAll(taskList.ToArray());
}
static void ConvertOnProxy(int proxyId, List<long> idsToConvert)
{
// TODO
}
}
I'm stuck on the part of how would I delegate 1,000 to each task, removing them from the list after they've been select so another thread doesn't take them, and keeping the thread safety?
I understand that my current code just grabs 1,000 items without thinking another task is going to do the exact same thing?
Here's an example of where I would start:
static async Task Test()
{
Queue<int> ids = new Queue<int>(Enumerable.Range(0, 100));
List<Task> tasks = new List<Task>();
for (int i = 0; i < 8; i++)
{
tasks.Add(DoTheThings(ids));
}
await Task.WhenAll(tasks);
}
static async Task DoTheThings(Queue<int> ids)
{
Random rnd = new Random();
int id;
for (;;)
{
lock (ids)
{
if (ids.Count == 0)
{
// All done.
return;
}
id = ids.Dequeue();
}
Debug.WriteLine($"Fetching ID {id}...");
// Simulate variable network delay.
await Task.Delay(rnd.Next(200) + 50);
}
}
This class uses lock and Interlocked.
Both increaseCount.with_lock.Run(); and increaseCount.with_interlock.Run(); prints between 96-100.
I am expecting both of them to print always 100. What did I make mistake?
public static class increaseCount {
public static int counter = 0;
public static readonly object myLock = new object();
public static class with_lock {
public static void Run() {
List<Thread> pool = new List<Thread>();
for(int i = 0; i < 100; i++) {
pool.Add(new Thread(f));
}
Parallel.ForEach(pool, x => x.Start());
Console.WriteLine(counter); //should print 100
}
static void f() {
lock(myLock) {
counter++;
}
}
}
public static class with_interlock {
public static void Run() {
List<Thread> pool = new List<Thread>();
for(int i = 0; i < 100; i++) {
pool.Add(new Thread(f));
}
Parallel.ForEach(pool, x => x.Start());
Console.WriteLine(counter);//should print 100
}
static void f() {
Interlocked.Add(ref counter, 1);
}
}
}
In both cases, you start up your threads but you don't wait for them to complete so you don't reach the 100 before you print the result and the app closes.
If after you start all thread you would wait for all these threads to complete with Thread.Join you would always get the correct result:
List<Thread> pool = new List<Thread>();
for (int i = 0; i < 100; i++)
{
pool.Add(new Thread(f));
}
Parallel.ForEach(pool, x => x.Start());
foreach (var thread in pool)
{
thread.Join();
}
Console.WriteLine(counter);
Note: This seems like a test of some kind, but you should know that blocking multiple threads on a single lock is a huge waste of resources.
I believe it's because your Parallel.Foreach call simply calls start on all the threads in pool but they haven't necessarily completed by the time the loops finished and the Console.WriteLine is called. If you were to insert a Thread.Sleep(5000); // 5s sleep or similar before the Console.WriteLine it would likely always print out what you expect.
Your code is fine. The only problem is your expectation. Basically, not all 100 threads get to run untill the counter is displayed. Try putting a Thread.Sleep(1000) before the Console.WriteLine(counter) and you shall see what I mean.
Edit: wrongly posted as a comment the first time.
I am trying to dynamically create X amount of threads(specified by user) then basically have all of them execute some code at the exact same time in intervals of 1 second.
The issue I am having is that the task I am trying to complete relies on a loop to determine if the current IP is equal to the last. (It scans hosts) So since I have this loop inside, it is going off and then the other threads are not getting created, and not executing the code. I would like them to all go off at the same time, wait 1 second(using a timer or something else that doesnt lock the thread since the code it is executing has a timeout it waits for.) Can anyone help me out? Here is my current code:
int threads = Convert.ToInt32(txtThreads.Text);
List<Thread> workerThreads = new List<Thread>();
string from = txtStart.Text, to = txtEnd.Text;
uint current = from.ToUInt(), last = to.ToUInt();
ulong total = last - current;
for (int i = 0; i < threads; i++)
{
Thread thread = new Thread(() =>
{
for (int t = 0; t < Convert.ToInt32(total); t += i)
{
while (current <= last)
{
current = Convert.ToUInt32(current + t);
var ip = current.ToIPAddress();
doSomething(ip);
}
}
});
workerThreads.Add(thread);
thread.Start();
}
Don't use a lambda as the body of your thread, otherwise the i value isn't doing what you think it's doing. Instead pass the value into a method.
As for starting all of the threads at the same time do something like the following:
private object syncObj = new object();
void ThreadBody(object boxed)
{
Params params = (Params)boxed;
lock (syncObj)
{
Monitor.Wait(syncObj);
}
// do work here
}
struct Params
{
// passed values here
}
void InitializeThreads()
{
int threads = Convert.ToInt32(txtThreads.Text);
List<Thread> workerThreads = new List<Thread>();
string from = txtStart.Text, to = txtEnd.Text;
uint current = from.ToUInt(), last = to.ToUInt();
ulong total = last - current;
for (int i = 0; i < threads; i++)
{
Thread thread = new Thread(new ParameterizedThreadStart(this.ThreadBody, new Params { /* initialize values here */ }));
workerThreads.Add(thread);
thread.Start();
}
lock(syncObj)
{
Monitor.PulseAll(syncObj);
}
}
You're running into closure problems. There's another question that somewhat addresses this, here.
Basically you need to capture the value of i as you create each task. What's happening is by the time the task gets around to actually running, the value of i across all your tasks is the same -- the value at the end of the loop.
So according to MSDN, and many other places I've read, they use a semaphore and block within the individual threads, like so:
private static Semaphore _pool;
public static void Main()
{
_pool = new Semaphore(0, 3);
for(int i = 1; i <= 1000; i++)
{
Thread t = new Thread(new ParameterizedThreadStart(Worker));
t.Start(i);
}
}
private static void Worker(object num)
{
try
{
_pool.WaitOne();
// do a long process here
}
finally
{
_pool.Release();
}
}
Wouldn't it make more sense to block the process so that you don't create potentially 1000s of threads all at once depending on the number of iterations in Main()? For example:
private static Semaphore _pool;
public static void Main()
{
_pool = new Semaphore(0, 3);
for(int i = 1; i <= 1000; i++)
{
_pool.WaitOne(); // wait for semaphore release here
Thread t = new Thread(new ParameterizedThreadStart(Worker));
t.Start(i);
}
}
private static void Worker(object num)
{
try
{
// do a long process here
}
finally
{
_pool.Release();
}
}
Maybe both ways are not wrong and it depends on the situation? Or there is a better way to do this once there are a lot of iterations?
Edit: This is a windows service, so I'm not blocking the UI thread.
The reason you would normally do it inside the thread is you want to make that exclusive section as small as possible. You don't need the entire thread synchronized, only where that thread accesses the shared resource.
So a more realistic version of Worker is
private static void Worker(object num)
{
//Do a bunch of work that can happen in parallel
try
{
_pool.WaitOne();
// do a small amount of work that can only happen in 3 threads at once
}
finally
{
_pool.Release();
}
//Do a bunch more work that can happen in parallel
}
(P.S. If you are doing something that uses 1000 threads, you are doing something wrong. You should likely rather be using a ThreadPool or Tasks for many short-lived workloads or make each thread do more work.)
Here is how to do it with Parallel.ForEach
private static BlockingCollection<int> _pool;
public static void Main()
{
_pool = new BlockingCollection<int>();
Task.Run(() => //This is run in another thread so it shows data is being taken out and put in at the same time
{
for(int i = 1; i <= 1000; i++)
{
_pool.Add(i);
}
_pool.CompleteAdding(); //Lets the foreach know no new items will be showing up.
});
//This will work on the items in _pool, if there is no items in the collection it will block till CompleteAdding() is called.
Parallel.ForEach(_pool.GetConsumingEnumerable(), new ParallelOptions {MaxDegreeOfParallelism = 3}, Worker);
}
private static void Worker(int num)
{
// do a long process here
}
This is further to my question here
By doing some reading .... I moved away from Semaphores to ThreadPool.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
namespace ThreadPoolTest
{
class Data
{
public int Pos { get; set; }
public int Num { get; set; }
}
class Program
{
static ManualResetEvent[] resetEvents = new ManualResetEvent[20];
static void Main(string[] args)
{
int s = 0;
for (int i = 0; i < 100000; i++)
{
resetEvents[s] = new ManualResetEvent(false);
Data d = new Data();
d.Pos = s;
d.Num = i;
ThreadPool.QueueUserWorkItem(new WaitCallback(Process), (object)d);
if (s >= 19)
{
WaitHandle.WaitAll(resetEvents);
Console.WriteLine("Press Enter to Move forward");
Console.ReadLine();
s = 0;
}
else
{
s = s + 1;
}
}
}
private static void Process(object o)
{
Data d = (Data) o;
Console.WriteLine(d.Num.ToString());
Thread.Sleep(10000);
resetEvents[d.Pos].Set();
}
}
}
This code works and I am able to process in the sets of 20. But I don't like this code because of WaitAll. So let's say I start a batch of 20, and 3 threads take longer time while 17 have finished. Even then I will keep the 17 threads as waiting because of the WaitAll.
WaitAny would have been good... but it seems rather messy that I will have to build so much of control structures like Stacks, Lists, Queues etc in order to use the pool efficiently.
The other thing I don't like is that whole global variable in the class for resetEvents. because this array has to be shared between the Process method and the main loop.
The above code works... but I need your help in improving it.
Again... I am on .NET 2.0 VS 2008. I cannot use .NET 4.0 parallel/async framework.
There are several ways you can do this. Probably the easiest, based on what you've posted above, would be:
const int MaxThreads = 4;
const int ItemsToProcess = 10000;
private Semaphore _sem = new Semaphore(MaxThreads, MaxThreads);
void DoTheWork()
{
int s = 0;
for (int i = 0; i < ItemsToProcess; ++i)
{
_sem.WaitOne();
Data d = new Data();
d.Pos = s;
d.Num = i;
ThreadPool.QueueUserWorkItem(Process, d);
++s;
if (s >= 19)
s = 0;
}
// All items have been assigned threads.
// Now, acquire the semaphore "MaxThreads" times.
// When counter reaches that number, we know all threads are done.
int semCount = 0;
while (semCount < MaxThreads)
{
_sem.WaitOne();
++semCount;
}
// All items are processed
// Clear the semaphore for next time.
_sem.Release(semCount);
}
void Process(object o)
{
// do the processing ...
// release the semaphore
_sem.Release();
}
I only used four threads in my example because that's how many cores I have. It makes little sense to be using 20 threads when only four of them can be processing at any one time. But you're free to increase the MaxThreads number if you like.
So I'm pretty sure this is all .NET 2.0.
We'll start out defining Action, because I'm so used to using it. If using this solution in 3.5+, remove that definition.
Next, we create a queue of actions based on the input.
After that we define a callback; this callback is the meat of the method.
It first grabs the next item in the queue (using a lock since the queue isn't thread safe). If it ended up having an item to grab it executes that item. Next it adds a new item to the thread pool which is "itself". This is a recursive anonymous method (you don't come across uses of that all that often). This means that when the callback is called for the first time it will execute one item, then schedule a task which will execute another item, and that item will schedule a task that executes another item, and so on. Eventually the queue will run out, and they'll stop queuing more items.
We also want the method to block until we're all done, so for that we keep track of how many of these callbacks have finished through incrementing a counter. When that counter reaches the task limit we signal the event.
Finally we start N of these callbacks in the thread pool.
public delegate void Action();
public static void Execute(IEnumerable<Action> actions, int maxConcurrentItems)
{
object key = new object();
Queue<Action> queue = new Queue<Action>(actions);
int count = 0;
AutoResetEvent whenDone = new AutoResetEvent(false);
WaitCallback callback = null;
callback = delegate
{
Action action = null;
lock (key)
{
if (queue.Count > 0)
action = queue.Dequeue();
}
if (action != null)
{
action();
ThreadPool.QueueUserWorkItem(callback);
}
else
{
if (Interlocked.Increment(ref count) == maxConcurrentItems)
whenDone.Set();
}
};
for (int i = 0; i < maxConcurrentItems; i++)
{
ThreadPool.QueueUserWorkItem(callback);
}
whenDone.WaitOne();
}
Here's another option that doesn't use the thread pool, and just uses a fixed number of threads:
public static void Execute(IEnumerable<Action> actions, int maxConcurrentItems)
{
Thread[] threads = new Thread[maxConcurrentItems];
object key = new object();
Queue<Action> queue = new Queue<Action>(actions);
for (int i = 0; i < maxConcurrentItems; i++)
{
threads[i] = new Thread(new ThreadStart(delegate
{
Action action = null;
do
{
lock (key)
{
if (queue.Count > 0)
action = queue.Dequeue();
else
action = null;
}
if (action != null)
{
action();
}
} while (action != null);
}));
threads[i].Start();
}
for (int i = 0; i < maxConcurrentItems; i++)
{
threads[i].Join();
}
}