In this example, is this the correct use of the Parallel.For loop if I want to limit the number of threads that can perform the function DoWork to ten at a time? Will other threads be blocked until one of the ten threads becomes available? If not, what is a better multi-threaded solution that would still let me execute that function 6000+ times?
class Program
{
static void Main(string[] args)
{
ThreadExample ex = new ThreadExample();
}
}
public class ThreadExample
{
int limit = 6411;
public ThreadExample()
{
Console.WriteLine("Starting threads...");
int temp = 0;
Parallel.For(temp, limit, new ParallelOptions { MaxDegreeOfParallelism = 10 }, i =>
{
DoWork(temp);
temp++;
});
}
public void DoWork(int info)
{
//Thread.Sleep(50); //doing some work here.
int num = info * 5;
Console.WriteLine("Thread: {0} Result: {1}", info.ToString(), num.ToString());
}
}
You need to use the i passed to the lambda function as index. Parallel.For relieves you from the hassle of working with the loop counter, but you need to use it!
Parallel.For(0, limit, new ParallelOptions { MaxDegreeOfParallelism = 10 }, i =>
{
DoWork(i);
});
As for your other questions:
Yes, this will correctly limit the amount of threads working simultaneously.
There are no threads being blocked. The iterations are queued and as soon as a thread becomes available, it takes the next iteration (in a synchronized manner) from the queue to process.
Related
I've coded a void to handle multiple threads for selenium web browsing. The issue is that right now for example, if i input 4 tasks, and 2 threads. The program says it finished when it has finished 2 tasks.
Edit: Basically I want the program to wait for the tasks to complete And also I want that if one thread finishes but the other is running and there are tasks to do, it goes directly to start another task, and not waiting for the 2nd thread to finish.
Thanks and sorry for the code, made it fast to show it as a example of how it is.
{
static void Main(string[] args)
{
Threads(4, 4);
Console.WriteLine("Program has finished");
Console.ReadLine();
}
static Random ran = new Random();
static int loop;
public static void Threads(int number, int threads)
{
for (int i = 0; i < number; i++)
{
if (threads == 1)
{
generateDriver();
}
else if (threads > 1)
{
start:
if (loop < threads)
{
loop++;
Thread thread = new Thread(() => generateDriver());
thread.Start();
}
else
{
Task.Delay(2000).Wait();
goto start;
}
}
}
}
public static void test(IWebDriver driver)
{
driver.Navigate().GoToUrl("https://google.com/");
int timer = ran.Next(100, 2000);
Task.Delay(timer).Wait();
Console.WriteLine("[" + DateTime.Now.ToString("hh:mm:ss") + "] - " + "Task done.");
loop--;
driver.Close();
}
public static void generateDriver()
{
ChromeOptions options = new ChromeOptions();
options.AddArguments("--disable-dev-shm-usage");
options.AddArguments("--disable-extensions");
options.AddArguments("--disable-gpu");
options.AddArguments("window-size=1024,768");
options.AddArguments("--test-type");
ChromeDriverService service = ChromeDriverService.CreateDefaultService(Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location));
service.HideCommandPromptWindow = true;
service.SuppressInitialDiagnosticInformation = true;
IWebDriver driver = new ChromeDriver(service, options);
test(driver);
}
Manually keeping track of running threads, waiting for them to finish and reusing ones that are already finished is not trivial.
However the .NET runtime provides ready made solutions that you should prefer to handling it yourself.
The simplest way to achieve your desired result is to use a Parallel.For loop and set the MaxDegreeOfParallelism, e.g.:
public static void Threads(int number, int threads)
{
Parallel.For(0, number,
new ParallelOptions { MaxDegreeOfParallelism = threads },
_ => generateDriver());
}
If you really want to do it manually you will need to use arrays of Thread (or Task) and keep iterating over them, checking whether they have finished and if they did replace them with a new thread. This requires quite a bit more code than the Parallel.For solution (and is unlikely to perform better)
I am working on improving some of my code to increase efficiency. In the original code I was limiting the number of threads allowed to be 5, and if I had already 5 active threads I would wait until one finished before starting another one. Now I want to modify this code to allow any number of threads, but I want to be able to make sure that only 5 threads get started every second. For example:
Second 0 - 5 new threads
Second 1 - 5 new threads
Second 2 - 5 new threads ...
Original Code (cleanseDictionary contains usually thousands of items):
ConcurrentDictionary<long, APIResponse> cleanseDictionary = new ConcurrentDictionary<long, APIResponse>();
ConcurrentBag<int> itemsinsec = new ConcurrentBag<int>();
ConcurrentDictionary<long, string> resourceDictionary = new ConcurrentDictionary<long, string>();
DateTime start = DateTime.Now;
Parallel.ForEach(resourceDictionary, new ParallelOptions { MaxDegreeOfParallelism = 5 }, row =>
{
lock (itemsinsec)
{
ThrottleAPIRequests(itemsinsec, start);
itemsinsec.Add(1);
}
cleanseDictionary.TryAdd(row.Key, _helper.MakeAPIRequest(string.Format("/endpoint?{0}", row.Value)));
});
private static void ThrottleAPIRequests(ConcurrentBag<int> itemsinsec, DateTime start)
{
if ((start - DateTime.Now).Milliseconds < 10001 && itemsinsec.Count > 4)
{
System.Threading.Thread.Sleep(1000 - (start - DateTime.Now).Milliseconds);
start = DateTime.Now;
itemsinsec = new ConcurrentBag<int>();
}
}
My first thought was increase the MaxDegreeofParallelism to something much higher and then have a helper method that will limit only 5 threads in a second, but I am not sure if that is the best way to do it and if it is, I would probably need a lock around that step?
Thanks in advance!
EDIT
I am actually looking for a way to throttle the API Requests rather than the actual threads. I was thinking they were one in the same.
Edit 2: My requirements are to send over 5 API requests every second
"Parallel.ForEach" from the MS website
may run in parallel
If you want any degree of fine control over how the threads are managed, this is not the way.
How about creating your own helper class where you can queue jobs with a group id, allows you to wait for all jobs of group id X to complete, and it spawns extra threads as and when required?
For me the best solution is:
using System;
using System.Collections.Concurrent;
using System.Threading.Tasks;
namespace SomeNamespace
{
public class RequestLimiter : IRequestLimiter
{
private readonly ConcurrentQueue<DateTime> _requestTimes;
private readonly TimeSpan _timeSpan;
private readonly object _locker = new object();
public RequestLimiter()
{
_timeSpan = TimeSpan.FromSeconds(1);
_requestTimes = new ConcurrentQueue<DateTime>();
}
public TResult Run<TResult>(int requestsOnSecond, Func<TResult> function)
{
WaitUntilRequestCanBeMade(requestsOnSecond).Wait();
return function();
}
private Task WaitUntilRequestCanBeMade(int requestsOnSecond)
{
return Task.Factory.StartNew(() =>
{
while (!TryEnqueueRequest(requestsOnSecond).Result) ;
});
}
private Task SynchronizeQueue()
{
return Task.Factory.StartNew(() =>
{
_requestTimes.TryPeek(out var first);
while (_requestTimes.Count > 0 && (first.Add(_timeSpan) < DateTime.UtcNow))
_requestTimes.TryDequeue(out _);
});
}
private Task<bool> TryEnqueueRequest(int requestsOnSecond)
{
lock (_locker)
{
SynchronizeQueue().Wait();
if (_requestTimes.Count < requestsOnSecond)
{
_requestTimes.Enqueue(DateTime.UtcNow);
return Task.FromResult(true);
}
return Task.FromResult(false);
}
}
}
}
I want to be able to send over 5 API request every second
That's really easy:
while (true) {
await Task.Delay(TimeSpan.FromSeconds(1));
await Task.WhenAll(Enumerable.Range(0, 5).Select(_ => RunRequestAsync()));
}
Maybe not the best approach since there will be a burst of requests. This is not continuous.
Also, there is timing skew. One iteration takes more than 1 second. This can be solved with a few lines of time logic.
I'm doing something like this..
Task.Factory.StartNew(() =>{
Parallel.ForEach(list, new ParallelOptions { MaxDegreeOfParallelism = 10 }, (listitem, state) =>
{
//do stuff here
Console.writeln(Process.GetCurrentProcess().Threads.Count);
});
});
The number of threads is my application is always in excess of 10? What am I doing wrong to limit the number of threads my app uses?
According to MSDN:
By default, For and ForEach will utilize however many threads the underlying scheduler provides, so changing MaxDegreeOfParallelism from the default only limits how many concurrent tasks will be used.
Thus, the thread count will exceed 10, however no more than 10 of those threads will run at a single time. This saves the underlying framework the hassle of having to track each thread and append code to it, possibly destabilizing one operation if another faults. Instead, we find it making arbitrarily many threads and throttling how many can run at a time.
You can even test this by adding a Count to the class, and seeing how high it ever goes:
// In the class scope
int _count = 0;
int MaxCount = 0;
object key = new object();
int Count
{
get { lock(key) return _count; }
set
{
lock(key)
{
_count = value;
if(_count > MaxCount) MaxCount = value;
}
}
}
...
Task.Factory.StartNew(() =>{
Parallel.ForEach(list, new ParallelOptions { MaxDegreeOfParallelism = 10 }, (listitem, state) =>
{
Count++;
Console.writeln(Process.GetCurrentProcess().Threads.Count);
Count--;
});
});
MaxDegreeOfParallelism doesn't limit the number of threads of your process (your console app for example). It limits the number of threads for the operation you are trying to run in parallel within Parallel.ForEach
Your application in the meantime might run x number of additional threads in parallel and Process.GetCurrentProcess().Threads.Count counts them all.
I have an app that takes on unknown amount of task. The task are blocking (they wait on network) i'll need multiple threads to keep busy.
Is there an easy way for me to have a giant list of task and worker threads which will pull the task when they are idle? ATM i just start a new thread for each task, which is fine but i'd like some control so if there are 100task i dont have 100threads.
Assuming that the network I/O classes that you are dealing with expose Begin/End style async methods, then what you want to do is use the TPL TaskFactory.FromAsync method. As laid out in TPL TaskFactory.FromAsync vs Tasks with blocking methods, the FromAsync method will use async I/O under the covers, rather than keeping a thread busy just waiting for the I/O to complete (which is actually not what you want).
The way that Async I/O works is that you have a pool of threads that can handle the result of I/O when the result is ready, so that if you have 100 outstanding I/Os you don't have 100 threads blocked waiting for those I/Os. When the whole pool is busy handling I/O results, subsequent results get queued up automatically until a thread frees up to handle them. Keeping a huge pool of threads waiting like that is a scalability disaster- threads are hugely expensive objects to keep around idling.
here a msdn sample to manage through a threadpool many threads:
using System;
using System.Threading;
public class Fibonacci
{
public Fibonacci(int n, ManualResetEvent doneEvent)
{
_n = n;
_doneEvent = doneEvent;
}
// Wrapper method for use with thread pool.
public void ThreadPoolCallback(Object threadContext)
{
int threadIndex = (int)threadContext;
Console.WriteLine("thread {0} started...", threadIndex);
_fibOfN = Calculate(_n);
Console.WriteLine("thread {0} result calculated...", threadIndex);
_doneEvent.Set();
}
// Recursive method that calculates the Nth Fibonacci number.
public int Calculate(int n)
{
if (n <= 1)
{
return n;
}
return Calculate(n - 1) + Calculate(n - 2);
}
public int N { get { return _n; } }
private int _n;
public int FibOfN { get { return _fibOfN; } }
private int _fibOfN;
private ManualResetEvent _doneEvent;
}
public class ThreadPoolExample
{
static void Main()
{
const int FibonacciCalculations = 10;
// One event is used for each Fibonacci object
ManualResetEvent[] doneEvents = new ManualResetEvent[FibonacciCalculations];
Fibonacci[] fibArray = new Fibonacci[FibonacciCalculations];
Random r = new Random();
// Configure and launch threads using ThreadPool:
Console.WriteLine("launching {0} tasks...", FibonacciCalculations);
for (int i = 0; i < FibonacciCalculations; i++)
{
doneEvents[i] = new ManualResetEvent(false);
Fibonacci f = new Fibonacci(r.Next(20,40), doneEvents[i]);
fibArray[i] = f;
ThreadPool.QueueUserWorkItem(f.ThreadPoolCallback, i);
}
// Wait for all threads in pool to calculation...
WaitHandle.WaitAll(doneEvents);
Console.WriteLine("All calculations are complete.");
// Display the results...
for (int i= 0; i<FibonacciCalculations; i++)
{
Fibonacci f = fibArray[i];
Console.WriteLine("Fibonacci({0}) = {1}", f.N, f.FibOfN);
}
}
}
I have a simply foreach loop that limits itself based on while loops and a static int. If I dont limit it, my CPU stays under 10% if i limit it my CPU goes up to 99/100%. How do I safely limit the number of calls to a class within a Paralell.Foreach?
static int ActiveThreads { get; set; }
static int TotalThreads { get; set; }
var options = new ParallelOptions();
options.MaxDegreeOfParallelism = 1;
Parallel.ForEach(urlTable.AsEnumerable(),options,drow =>
{
using (var WCC = new MasterCrawlerClass())
{
while (TotalThreads <= urlTable.Rows.Count)
{
if (ActiveThreads <= 9)
{
Console.WriteLine("Active Thread #: " + ActiveThreads);
ActiveThreads++;
WCC.MasterCrawlBegin(drow);
TotalThreads++;
Console.WriteLine("Done Crawling a datarow");
ActiveThreads--;
}
}
}
});
I need to limit it, and yes I understand Max Parallelism has it's own limit, however, my switch gets bogged down before the CPU in the server hits that limit.
Two things :
1) You don't seem to be using your ParallelOptions() that you created in this example.
2) You can use a Semaphore if for some reason you don't want to use the ParallelOptions.
Semaphore sm = new Semaphore(0, 9);
// increment semaphore or block if = 9
// this will block gracefully without constantly checking for `ActiveThreads <= 9`
sm.WaitOne();
// decrement semaphore
sm.Release();
I have a simply foreach loop that limits itself based on while loops
and a static int. If I dont limit it, my CPU stays under 10% if i
limit it my CPU goes up to 99/100%.
That is pretty odd. It may be a result of the way you have limited the concurrency with the loop which, by the way, appears to cause each drow to be crawled many times. I doubt that is what you want. You are getting low CPU utilization because the crawl operation is IO bound.
If you really want to limit the number of concurrent calls to MasterCrawlBegin to 9 then set MaxDegreesOfParallelism = 9. The while loop and maintanence of TotalThreads and ActiveThreads is not going to work. As a side note you are incrementing and decrementing the counters in a manner that is not thread-safe.
Change your code to look like this.
int ActiveThreads = 0;
var options = new ParallelOptions();
options.MaxDegreeOfParallelism = 9;
Parallel.ForEach(urlTable.AsEnumerable(),options,drow =>
{
int x = Interlocked.Increment(ref ActiveThreads);
Console.WriteLine("Active Thread #: " + x);
try
{
using (var WCC = new MasterCrawlerClass())
{
WCC.MasterCrawlBegin(drow);
}
}
finally
{
Interlocked.Decrement(ref ActiveThreads);
Console.WriteLine("Done Crawling a datarow");
}
});