I have multiple instances of same application, that start at the same time and runs for equal time period (almost). Each application uses Tasks for parallel programming within them. Now I want to limit the number of parallel tasks that can run across these application instances. How can I do it? I tried using semaphore. But no luck. Let's say I am running 5 instance of the application. The first instance creates a semaphore and holds it for n seconds. For n seconds, the remaining four instances are waiting, which is fine. But after that n seconds, the first instance exits, disposing the semaphore instance I guess. After that the remaining 4 starts executing in parallel. Please help.
My code would look something like this:
static void Main(string[] args)
{
List<string> itemList = GetItemList();
Semaphore throttler = new Semaphore(2, 2, "MySemaPhore");
foreach (var item in itemList)
{
throttler.WaitOne();
Task.Run(() =>
{
try
{
DoWork(item);
}
finally
{
throttler.Release();
}
});
}
}
There is a mistake in your code:
You should use (0, 2) for the Semaphore constructor.
The first argument 0 means that initially there are 0 resources occupied.
This can be seen from the official documentation page. https://learn.microsoft.com/en-us/dotnet/api/system.threading.semaphore?view=netcore-3.1. Especially the line "_pool = new Semaphore(0, 3);" in the code example. If you want to go deeper into programming, you should start the habit of reading (at least skimming through) the official documentation page of whatever class you are using.
Now this is the corrected code:
static void Main(string[] args)
{
List<string> itemList = GetItemList();
Semaphore throttler = new Semaphore(0, 2, "MySemaPhore");
foreach (var item in itemList)
{
throttler.WaitOne();
Task.Run(() =>
{
try
{
DoWork(item);
}
finally
{
throttler.Release();
}
});
}
}
Related
I have an IEnumerable with a lot of items that needs to be processed in parallel. The items are not CPU intensive.
Ideally these items should be executed simultaneouslyon 100 threads or more.
I've tried to do this with Parallel.ForEach(). That works, but the problem is that new threads are spawn too slowly. It takes a (too) long time before the Parallel.Foreach() reaches the 100 threads. I know there is a MaxDegreeOfParallelism property, but that's a maximum, not mimimum.
Is there a way to execute the foreach immediatelly on 100 threads?
ThreadPool.SetMinThreads is something that we prefer to avoid, because it has an impact on the whole process.
Is there a solution possible with a custom partitioner?
I'm pinging a lot of devices with a timeout of 5 seconds. How would you do that as quick as possible with only 4 threads (4cores)?
I'm going to assume your pinging devices on a LAN and each one is identifiable and reachable by an IP address.
namespace PingManyDevices {
public class DeviceChecker {
public async Task<PingReply[]> CheckAllDevices(IEnumerable<IPAddress> devices) {
var pings = devices.Select(address => new Ping().SendPingAsync(address, 5000));
return await Task.WhenAll(pings);
}
/***
* Maybe push it a little further
***/
public async Task<PingReply[]> CheckAllDevices(IEnumerable<IPAddress> devices) {
var pings = devices.AsParallel().Select(address => new Ping().SendPingAsync(address, 5000));
return await Task.WhenAll(pings);
}
}
}
I've had success using ThreadPool instead of Parallel:
public static void ThreadForEach<T>(this IEnumerable<T> items, Action<T> action)
{
var mres = new List<ManualResetEvent>();
foreach (var item in items)
{
var mre = new ManualResetEvent(false);
ThreadPool.QueueUserWorkItem((i) =>
{
action((T)i);
mre.Set();
}, item);
mres.Add(mre);
}
mres.ForEach(mre => mre.WaitOne());
}
In cases where I've had to use this, it ran faster than attempts using Parallel.ForEach. I can only speculate that it is because it attempts to use already existing threads (instead of taking the overhead to create new ones).
I am working on improving some of my code to increase efficiency. In the original code I was limiting the number of threads allowed to be 5, and if I had already 5 active threads I would wait until one finished before starting another one. Now I want to modify this code to allow any number of threads, but I want to be able to make sure that only 5 threads get started every second. For example:
Second 0 - 5 new threads
Second 1 - 5 new threads
Second 2 - 5 new threads ...
Original Code (cleanseDictionary contains usually thousands of items):
ConcurrentDictionary<long, APIResponse> cleanseDictionary = new ConcurrentDictionary<long, APIResponse>();
ConcurrentBag<int> itemsinsec = new ConcurrentBag<int>();
ConcurrentDictionary<long, string> resourceDictionary = new ConcurrentDictionary<long, string>();
DateTime start = DateTime.Now;
Parallel.ForEach(resourceDictionary, new ParallelOptions { MaxDegreeOfParallelism = 5 }, row =>
{
lock (itemsinsec)
{
ThrottleAPIRequests(itemsinsec, start);
itemsinsec.Add(1);
}
cleanseDictionary.TryAdd(row.Key, _helper.MakeAPIRequest(string.Format("/endpoint?{0}", row.Value)));
});
private static void ThrottleAPIRequests(ConcurrentBag<int> itemsinsec, DateTime start)
{
if ((start - DateTime.Now).Milliseconds < 10001 && itemsinsec.Count > 4)
{
System.Threading.Thread.Sleep(1000 - (start - DateTime.Now).Milliseconds);
start = DateTime.Now;
itemsinsec = new ConcurrentBag<int>();
}
}
My first thought was increase the MaxDegreeofParallelism to something much higher and then have a helper method that will limit only 5 threads in a second, but I am not sure if that is the best way to do it and if it is, I would probably need a lock around that step?
Thanks in advance!
EDIT
I am actually looking for a way to throttle the API Requests rather than the actual threads. I was thinking they were one in the same.
Edit 2: My requirements are to send over 5 API requests every second
"Parallel.ForEach" from the MS website
may run in parallel
If you want any degree of fine control over how the threads are managed, this is not the way.
How about creating your own helper class where you can queue jobs with a group id, allows you to wait for all jobs of group id X to complete, and it spawns extra threads as and when required?
For me the best solution is:
using System;
using System.Collections.Concurrent;
using System.Threading.Tasks;
namespace SomeNamespace
{
public class RequestLimiter : IRequestLimiter
{
private readonly ConcurrentQueue<DateTime> _requestTimes;
private readonly TimeSpan _timeSpan;
private readonly object _locker = new object();
public RequestLimiter()
{
_timeSpan = TimeSpan.FromSeconds(1);
_requestTimes = new ConcurrentQueue<DateTime>();
}
public TResult Run<TResult>(int requestsOnSecond, Func<TResult> function)
{
WaitUntilRequestCanBeMade(requestsOnSecond).Wait();
return function();
}
private Task WaitUntilRequestCanBeMade(int requestsOnSecond)
{
return Task.Factory.StartNew(() =>
{
while (!TryEnqueueRequest(requestsOnSecond).Result) ;
});
}
private Task SynchronizeQueue()
{
return Task.Factory.StartNew(() =>
{
_requestTimes.TryPeek(out var first);
while (_requestTimes.Count > 0 && (first.Add(_timeSpan) < DateTime.UtcNow))
_requestTimes.TryDequeue(out _);
});
}
private Task<bool> TryEnqueueRequest(int requestsOnSecond)
{
lock (_locker)
{
SynchronizeQueue().Wait();
if (_requestTimes.Count < requestsOnSecond)
{
_requestTimes.Enqueue(DateTime.UtcNow);
return Task.FromResult(true);
}
return Task.FromResult(false);
}
}
}
}
I want to be able to send over 5 API request every second
That's really easy:
while (true) {
await Task.Delay(TimeSpan.FromSeconds(1));
await Task.WhenAll(Enumerable.Range(0, 5).Select(_ => RunRequestAsync()));
}
Maybe not the best approach since there will be a burst of requests. This is not continuous.
Also, there is timing skew. One iteration takes more than 1 second. This can be solved with a few lines of time logic.
At work one of our processes uses a SQL database table as a queue. I've been designing a queue reader to check the table for queued work, update the row status when work starts, and delete the row when the work is finished. I'm using Parallel.Foreach to give each process its own thread and setting MaxDegreeOfParallelism to 4.
When the queue reader starts up it checks for any unfinished work and loads the work into an list, then it does a Concat with that list and a method that returns an IEnumerable which runs in an infinite loop checking for new work to do. The idea is that the unfinished work should be processed first and then the new work can be worked as threads are available. However what I'm seeing is that FetchQueuedWork will change dozens of rows in the queue table to 'Processing' immediately but only work on a few items at a time.
What I expected to happen was that FetchQueuedWork would only get new work and update the table when a slot opened up in the Parallel.Foreach. What's really odd to me is that it behaves exactly as I would expect when I run the code in my local developer environment, but in production I get the above problem.
I'm using .Net 4. Here is the code:
public void Go()
{
List<WorkData> unfinishedWork = WorkData.LoadUnfinishedWork();
IEnumerable<WorkData> work = unfinishedWork.Concat(FetchQueuedWork());
Parallel.ForEach(work, new ParallelOptions { MaxDegreeOfParallelism = 4 }, DoWork);
}
private IEnumerable<WorkData> FetchQueuedWork()
{
while (true)
{
var workUnit = WorkData.GetQueuedWorkAndSetStatusToProcessing();
yield return workUnit;
}
}
private void DoWork(WorkData workUnit)
{
if (!workUnit.Loaded)
{
System.Threading.Thread.Sleep(5000);
return;
}
Work();
}
I suspect that the default (Release mode?) behaviour is to buffer the input. You might need to create your own partitioner and pass it the NoBuffering option:
List<WorkData> unfinishedWork = WorkData.LoadUnfinishedWork();
IEnumerable<WorkData> work = unfinishedWork.Concat(FetchQueuedWork());
var options = new ParallelOptions { MaxDegreeOfParallelism = 4 };
var partitioner = Partitioner.Create(work, EnumerablePartitionerOptions.NoBuffering);
Parallel.ForEach(partioner, options, DoWork);
Blorgbeard's solution is correct when it comes to .NET 4.5 - hands down.
If you are constrained to .NET 4, you have a few options:
Replace your Parallel.ForEach with work.AsParallel().WithDegreeOfParallelism(4).ForAll(DoWork). PLINQ is more conservative when it comes to buffering items, so this should do the trick.
Write your own enumerable partitioner (good luck).
Create a grotty semaphore-based hack such as this:
(Side-effecting Select used for the sake of brevity)
public void Go()
{
const int MAX_DEGREE_PARALLELISM = 4;
using (var semaphore = new SemaphoreSlim(MAX_DEGREE_PARALLELISM, MAX_DEGREE_PARALLELISM))
{
List<WorkData> unfinishedWork = WorkData.LoadUnfinishedWork();
IEnumerable<WorkData> work = unfinishedWork
.Concat(FetchQueuedWork())
.Select(w =>
{
// Side-effect: bad practice, but easier
// than writing your own IEnumerable.
semaphore.Wait();
return w;
});
// You still need to specify MaxDegreeOfParallelism
// here so as not to saturate your thread pool when
// Parallel.ForEach's load balancer kicks in.
Parallel.ForEach(work, new ParallelOptions { MaxDegreeOfParallelism = MAX_DEGREE_PARALLELISM }, workUnit =>
{
try
{
this.DoWork(workUnit);
}
finally
{
semaphore.Release();
}
});
}
}
I have a instance of a class that is accessed from several threads. This class take this calls and add a tuple into a database. I need this to be done in a serial manner, as due to some db constraints, parallel threads could result in an inconsistent database.
As I am new to parallelism and concurrency in C#, I did this:
private BlockingCollection<Task> _tasks = new BlockingCollection<Task>();
public void AddDData(string info)
{
Task t = new Task(() => { InsertDataIntoBase(info); });
_tasks.Add(t);
}
private void InsertWorker()
{
Task.Factory.StartNew(() =>
{
while (!_tasks.IsCompleted)
{
Task t;
if (_tasks.TryTake(out t))
{
t.Start();
t.Wait();
}
}
});
}
The AddDData is the one who is called by multiple threads and InsertDataIntoBase is a very simple insert that should take few milliseconds.
The problem is that, for some reason that my lack of knowledge doesn't allow me to figure out, sometimes a task is been called twice! It always goes like this:
T1
T2
T3
T1 <- PK error.
T4
...
Did I understand .Take() completely wrong, am I missing something or my producer/ consumer implementation is really bad?
Best Regards,
Rafael
UPDATE:
As suggested, I made a quick sandbox test implementation with this architecture and as I was suspecting, it does not guarantee that a task will not be fired before the previous one finishes.
So the question remains: how to properly queue tasks and fire them sequentially?
UPDATE 2:
I simplified the code:
private BlockingCollection<Data> _tasks = new BlockingCollection<Data>();
public void AddDData(Data info)
{
_tasks.Add(info);
}
private void InsertWorker()
{
Task.Factory.StartNew(() =>
{
while (!_tasks.IsCompleted)
{
Data info;
if (_tasks.TryTake(out info))
{
InsertIntoDB(info);
}
}
});
}
Note that I got rid of Tasks as I'm relying on synced InsertIntoDB call (as it is inside a loop), but still no luck... The generation is fine and I'm absolutely sure that only unique instances are going to the queue. But no matter I try, sometimes the same object is used twice.
I think this should work:
private static BlockingCollection<string> _itemsToProcess = new BlockingCollection<string>();
static void Main(string[] args)
{
InsertWorker();
GenerateItems(10, 1000);
_itemsToProcess.CompleteAdding();
}
private static void InsertWorker()
{
Task.Factory.StartNew(() =>
{
while (!_itemsToProcess.IsCompleted)
{
string t;
if (_itemsToProcess.TryTake(out t))
{
// Do whatever needs doing here
// Order should be guaranteed since BlockingCollection
// uses a ConcurrentQueue as a backing store by default.
// http://msdn.microsoft.com/en-us/library/dd287184.aspx#remarksToggle
Console.WriteLine(t);
}
}
});
}
private static void GenerateItems(int count, int maxDelayInMs)
{
Random r = new Random();
string[] items = new string[count];
for (int i = 0; i < count; i++)
{
items[i] = i.ToString();
}
// Simulate many threads adding items to the collection
items
.AsParallel()
.WithDegreeOfParallelism(4)
.WithExecutionMode(ParallelExecutionMode.ForceParallelism)
.Select((x) =>
{
Thread.Sleep(r.Next(maxDelayInMs));
_itemsToProcess.Add(x);
return x;
}).ToList();
}
This does mean that the consumer is single threaded, but allows for multiple producer threads.
From your comment
"I simplified the code shown here, as the data is not a string"
I assume that info parameter passed into AddDData is a mutable reference type. Make sure that the caller is not using the same info instance for multple calls since that reference is captured in Task lambda .
Based on the trace that you provided the only logical possibility is that you have called InsertWorker twice (or more). There are thus two background threads waiting for items to appear in the collection and occasionally they both manage to grab an item and begin executing it.
I am automating some tasks on my website, but I'm currently stuck.
public void Execute(JobExecutionContext context)
{
var linqFindAccount = from Account in MainAccounts
where Account.Done == false
select Account;
foreach (var acc in linqFindAccount)
{
acc.Done = true;
// stuff
}
}
The issue is that when I start multiple threads the first threads get assigned to the same first account because they set the Done value to true at the same time. How am I supposed to avoid this?
EDIT:
private object locker = new object();
public void Execute(JobExecutionContext context)
{
lock (locker)
{
var linqFindAccount = from Account in MainAccounts
where Account.Done == false
select Account;
foreach (var acc in linqFindAccount)
{
Console.WriteLine(context.JobDetail.Name + " assigned to " + acc.Mail);
acc.Done = true;
// stuff
}
}
}
Instance [ 2 ] assigned to firstmail#hotmail.com
Instance [ 1 ] assigned to firstmail#hotmail.com
First two threads got assigned to the first account, even though the list contains 30 accounts.
Thanks.
Use
private static readonly object locker = new object();
instead of
private object locker = new object();
Your problem is that the deferred execution happens when you start the foreach loop. So the result is cached and not reevaluated every loop. So every thread will work with it's own list of the items. So when an Account is set to done, the other list still remain with the object in it.
A Queue is more suitable in this case. Just put the items in a shared Queue and let the loops take items of the queue and let them finish when the Queue is empty.
Few problems with your code:
1) Assuming you use stateless Quartz jobs, your lock does not do any good. Quartz creates new job instance every time it fires a trigger. That is why you see the same account processed twice. It would only work if you use stateful job (IStatefulJob). Or make lock static, but read on.
2) Even if 1) is fixed, it would defeat the purpose of having multiple threads because they will all wait for each other on the same lock. You might as well have one thread doing this.
I don't know enough about requirements especially what's going on in // stuff. It maybe that you don't need this code to run on multiple threads and sequential execution will do just fine. I assume this is not the case and you want to run it multiple threads. The easiest way is to have only one Quartz job. In this job, load Accounts in chunks, say 100 jobs in every chunk. This will give you 5 chunks if you have 500 accounts. Offload every chunk processing to thread pool. It will take care of using optimal number of threads. This would be a poor man's Producer Consumer Queue.
public void Execute(JobExecutionContext context) {
var linqFindAccount = from Account in MainAccounts
where Account.Done == false
select Account;
IList<IList<Account>> chunks = linqFindAccount.SplitIntoChunks(/* TODO */);
foreach (IList<Account> chunk in chunks) {
ThreadPool.QueueUserWorkItem(DoStuff, chunk);
}
}
private static void DoStuff(Object parameter) {
IList<Account> chunk = (IList<Account>) parameter;
foreach (Account account in chunk) {
// stuff
}
}
As usual, with multiple threads you have to be very careful with accessing mutable shared state. You will have to make sure that everything you do in 'DoStuff' method will not cause undesired side effects. You may find this and this useful.
foreach (var acc in linqFindAccount)
{
string mailComponent = acc.Mail;
Console.WriteLine(context.JobDetail.Name + " assigned to " + mailComponent);
acc.Done = true;
// stuff
}
Try above.