Parallel.Invoke wait if tasks are busy - c#

Can someone help me with the following code please. in the line:
Parallel.Invoke(parallelOptions, () => dosomething(message));
I want to invoke up to 5 parallel tasks (if there are 5 busy, wait for the next available, then start it... if only 4 busy, start a 5th, etc)
private AutoResetEvent autoResetEvent1 = new AutoResetEvent(false);
private ParallelOptions parallelOptions = new ParallelOptions { MaxDegreeOfParallelism = 5 };
private void threadProc()
{
queue.ReceiveCompleted += MyReceiveCompleted1;
Debug.WriteLine("about to start loop");
while (!shutdown)
{
queue.BeginReceive();
autoResetEvent1.WaitOne();
}
queue.ReceiveCompleted -= MyReceiveCompleted1;
queue.Dispose();
Debug.WriteLine("we are done");
}
private void MyReceiveCompleted1(object sender, ReceiveCompletedEventArgs e)
{
var message = queue.EndReceive(e.AsyncResult);
Debug.WriteLine("number of max tasks: " + parallelOptions.MaxDegreeOfParallelism);
Parallel.Invoke(parallelOptions, () => dosomething(message));
autoResetEvent1.Set();
}
private void dosomething(Message message)
{
//dummy body
var i = 0;
while (true)
{
Thread.Sleep(TimeSpan.FromSeconds(1));
i++;
if (i == 5 || i == 10 || i == 15) Debug.WriteLine("loop number: " + i + " on thread: " + Thread.CurrentThread.ManagedThreadId);
if (i == 15)
break;
}
Debug.WriteLine("finished task");
}
THE RESULTS I GET NOW:
1) with dosomething() as you see it above, i get only one at a time (it waits)
2) with dosomething() changed to below, i get none stops x number of tasks (not limited or obeying MaxDegreeOfParallelism
private async Task dosomething(Message message)
{
//dummy body
var i = 0;
while (true)
{
await Task.Delay(TimeSpan.FromSeconds(1));
i++;
if (i == 5 || i == 10 || i == 15) Debug.WriteLine("loop number: " + i + " on thread: " + Thread.CurrentThread.ManagedThreadId);
if (i == 15)
break;
}
Debug.WriteLine("finished task");
}
What am i doing wrong to get what i want accomplished?
THE OUTCOME I WANT:
in "MyReceiveCompleted", i want to make sure only 5 simultaneous tasks are processing messages, if there are 5 busy ones, wait for one to become available.

This line of code:
Parallel.Invoke(parallelOptions, () => dosomething(message));
is telling the TPL to start a new parallel operation with only one thing to do. So, the "max parallelism" option of 5 is kind of meaningless, since there will only be one thing to do. autoResetEvent1 ensures that there will only be one parallel operation at a time, and each parallel operation only has one thing to do, so the observed behavior of only one thing running at a time is entirely expected.
When you change the delegate to be asynchronous, what is actually happening is that the MaxDegreeOfParallelism is only applied to the synchronous portion of the method. So once it hits its first await, it "leaves" the parallel operation and is no longer constrained by it.
The core problem is that Parallel works best when the number of operations is known ahead of time - which your code doesn't know; it's just reading them from a queue and processing them as they arrive. As someone commented, you could solve this using dynamic task parallelism with a TaskScheduler that limits concurrency.
However, the simplest solution is probably TPL Dataflow. You can create an ActionBlock<T> with the appropriate throttling option and send messages to it as they arrive:
private ActionBlock<string> block = new ActionBlock<string>(
message => dosomething(message),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 5 });
private void MyReceiveCompleted1(object sender, ReceiveCompletedEventArgs e)
{
var message = queue.EndReceive(e.AsyncResult);
block.Post(message);
autoResetEvent1.Set();
}
One nice aspect of TPL Dataflow is that it also understands asynchronous methods, and will interpret MaxDegreeOfParallelism as a maximum degree of concurrency.

Related

Issue with Multithreading, saying it finished before the tasks haven't

I've coded a void to handle multiple threads for selenium web browsing. The issue is that right now for example, if i input 4 tasks, and 2 threads. The program says it finished when it has finished 2 tasks.
Edit: Basically I want the program to wait for the tasks to complete And also I want that if one thread finishes but the other is running and there are tasks to do, it goes directly to start another task, and not waiting for the 2nd thread to finish.
Thanks and sorry for the code, made it fast to show it as a example of how it is.
{
static void Main(string[] args)
{
Threads(4, 4);
Console.WriteLine("Program has finished");
Console.ReadLine();
}
static Random ran = new Random();
static int loop;
public static void Threads(int number, int threads)
{
for (int i = 0; i < number; i++)
{
if (threads == 1)
{
generateDriver();
}
else if (threads > 1)
{
start:
if (loop < threads)
{
loop++;
Thread thread = new Thread(() => generateDriver());
thread.Start();
}
else
{
Task.Delay(2000).Wait();
goto start;
}
}
}
}
public static void test(IWebDriver driver)
{
driver.Navigate().GoToUrl("https://google.com/");
int timer = ran.Next(100, 2000);
Task.Delay(timer).Wait();
Console.WriteLine("[" + DateTime.Now.ToString("hh:mm:ss") + "] - " + "Task done.");
loop--;
driver.Close();
}
public static void generateDriver()
{
ChromeOptions options = new ChromeOptions();
options.AddArguments("--disable-dev-shm-usage");
options.AddArguments("--disable-extensions");
options.AddArguments("--disable-gpu");
options.AddArguments("window-size=1024,768");
options.AddArguments("--test-type");
ChromeDriverService service = ChromeDriverService.CreateDefaultService(Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location));
service.HideCommandPromptWindow = true;
service.SuppressInitialDiagnosticInformation = true;
IWebDriver driver = new ChromeDriver(service, options);
test(driver);
}
Manually keeping track of running threads, waiting for them to finish and reusing ones that are already finished is not trivial.
However the .NET runtime provides ready made solutions that you should prefer to handling it yourself.
The simplest way to achieve your desired result is to use a Parallel.For loop and set the MaxDegreeOfParallelism, e.g.:
public static void Threads(int number, int threads)
{
Parallel.For(0, number,
new ParallelOptions { MaxDegreeOfParallelism = threads },
_ => generateDriver());
}
If you really want to do it manually you will need to use arrays of Thread (or Task) and keep iterating over them, checking whether they have finished and if they did replace them with a new thread. This requires quite a bit more code than the Parallel.For solution (and is unlikely to perform better)

Running multiple background tasks in a WPF applicatiion - UI update stalling

I have tried so many variants of this code. I'm receiving the same issue no matter what. The UI updating starts fine and then stalls until the entire process is complete. Can someone point me in the right direction?
The scenario
In a WPF application we will be calling the same API thousands of times with different parameters passed. We need to collect all the responses and do something.
Sample code
List<Task> tasks = new List<Task>();
for (int i = 1; i <= iterations; i++)
{
Task t = SampleTask(new SampleTaskParameterCollection { TaskId = i, Locker = locker, MinSleep = minSleep, MaxSleep = maxSleep });
tasks.Add(t);
}
Task.WhenAll(tasks);
private void SampleTask(SampleTaskParameterCollection parameters)
{
int sleepTime = rnd.Next(parameters.MinSleep, parameters.MaxSleep);
Thread.Sleep(sleepTime);
Application.Current.Dispatcher.BeginInvoke(new Action(() =>
{
lock (parameters.Locker)
{
ProgressBar1.Value = ProgressBar1.Value + 1;
LogTextbox.Text = LogTextbox.Text + Environment.NewLine + "Task " + parameters.TaskId + " slept for " + sleepTime + "ms and has now completed.";
}
LogTextbox.ScrollToEnd();
if (ProgressBar1.Maximum == ProgressBar1.Value)
{
RunSlowButton.IsEnabled = true;
RunFastButton.IsEnabled = true;
ProgressBar1.Value = 0;
}
}), System.Windows.Threading.DispatcherPriority.Send);
}
The current repo is located on GitHub. Look at the SimpleWindow.
Do not create thousands of tasks - this will cause immense performance problems.
Instead, use something like Parallel.For() to limit the number of tasks that run simultaneously; for example:
Parallel.For(1,
iterations + 1,
(index) =>
{
SampleTask(new SampleTaskParameterCollection { TaskId = index, Locker = locker, MinSleep = minSleep, MaxSleep = maxSleep });
});
Also if the UI updates take longer than the interval between the calls to BeginInvoke() then the invokes will begin to be queued up and things will get nasty.
To solve that, you could use a counter in your SampleTask() to only actually update the UI once every N calls (with a suitable value for N).
However, note that to avoid threading issues you'd have to use Interlocked.Increment() (or some other lock) when incrementing and checking the value of the counter. You'd also have to ensure that you updated the UI one last time when all the work is done.

Simple way to rate limit HttpClient requests

I am using the HTTPClient in System.Net.Http to make requests against an API. The API is limited to 10 requests per second.
My code is roughly like so:
List<Task> tasks = new List<Task>();
items..Select(i => tasks.Add(ProcessItem(i));
try
{
await Task.WhenAll(taskList.ToArray());
}
catch (Exception ex)
{
}
The ProcessItem method does a few things but always calls the API using the following:
await SendRequestAsync(..blah). Which looks like:
private async Task<Response> SendRequestAsync(HttpRequestMessage request, CancellationToken token)
{
token.ThrowIfCancellationRequested();
var response = await HttpClient
.SendAsync(request: request, cancellationToken: token).ConfigureAwait(continueOnCapturedContext: false);
token.ThrowIfCancellationRequested();
return await Response.BuildResponse(response);
}
Originally the code worked fine but when I started using Task.WhenAll I started getting 'Rate Limit Exceeded' messages from the API. How can I limit the rate at which requests are made?
Its worth noting that ProcessItem can make between 1-4 API calls depending on the item.
The API is limited to 10 requests per second.
Then just have your code do a batch of 10 requests, ensuring they take at least one second:
Items[] items = ...;
int index = 0;
while (index < items.Length)
{
var timer = Task.Delay(TimeSpan.FromSeconds(1.2)); // ".2" to make sure
var tasks = items.Skip(index).Take(10).Select(i => ProcessItemsAsync(i));
var tasksAndTimer = tasks.Concat(new[] { timer });
await Task.WhenAll(tasksAndTimer);
index += 10;
}
Update
My ProcessItems method makes 1-4 API calls depending on the item.
In this case, batching is not an appropriate solution. You need to limit an asynchronous method to a certain number, which implies a SemaphoreSlim. The tricky part is that you want to allow more calls over time.
I haven't tried this code, but the general idea I would go with is to have a periodic function that releases the semaphore up to 10 times. So, something like this:
private readonly SemaphoreSlim _semaphore = new SemaphoreSlim(10);
private async Task<Response> ThrottledSendRequestAsync(HttpRequestMessage request, CancellationToken token)
{
await _semaphore.WaitAsync(token);
return await SendRequestAsync(request, token);
}
private async Task PeriodicallyReleaseAsync(Task stop)
{
while (true)
{
var timer = Task.Delay(TimeSpan.FromSeconds(1.2));
if (await Task.WhenAny(timer, stop) == stop)
return;
// Release the semaphore at most 10 times.
for (int i = 0; i != 10; ++i)
{
try
{
_semaphore.Release();
}
catch (SemaphoreFullException)
{
break;
}
}
}
}
Usage:
// Start the periodic task, with a signal that we can use to stop it.
var stop = new TaskCompletionSource<object>();
var periodicTask = PeriodicallyReleaseAsync(stop.Task);
// Wait for all item processing.
await Task.WhenAll(taskList);
// Stop the periodic task.
stop.SetResult(null);
await periodicTask;
The answer is similar to this one.
Instead of using a list of tasks and WhenAll, use Parallel.ForEach and use ParallelOptions to limit the number of concurrent tasks to 10, and make sure each one takes at least 1 second:
Parallel.ForEach(
items,
new ParallelOptions { MaxDegreeOfParallelism = 10 },
async item => {
ProcessItems(item);
await Task.Delay(1000);
}
);
Or if you want to make sure each item takes as close to 1 second as possible:
Parallel.ForEach(
searches,
new ParallelOptions { MaxDegreeOfParallelism = 10 },
async item => {
var watch = new Stopwatch();
watch.Start();
ProcessItems(item);
watch.Stop();
if (watch.ElapsedMilliseconds < 1000) await Task.Delay((int)(1000 - watch.ElapsedMilliseconds));
}
);
Or:
Parallel.ForEach(
searches,
new ParallelOptions { MaxDegreeOfParallelism = 10 },
async item => {
await Task.WhenAll(
Task.Delay(1000),
Task.Run(() => { ProcessItems(item); })
);
}
);
UPDATED ANSWER
My ProcessItems method makes 1-4 API calls depending on the item. So with a batch size of 10 I still exceed the rate limit.
You need to implement a rolling window in SendRequestAsync. A queue containing timestamps of each request is a suitable data structure. You dequeue entries with a timestamp older than 10 seconds. As it so happens, there is an implementation as an answer to a similar question on SO.
ORIGINAL ANSWER
May still be useful to others
One straightforward way to handle this is to batch your requests in groups of 10, run those concurrently, and then wait until a total of 10 seconds has elapsed (if it hasn't already). This will bring you in right at the rate limit if the batch of requests can complete in 10 seconds, but is less than optimal if the batch of requests takes longer. Have a look at the .Batch() extension method in MoreLinq. Code would look approximately like
foreach (var taskList in tasks.Batch(10))
{
Stopwatch sw = Stopwatch.StartNew(); // From System.Diagnostics
await Task.WhenAll(taskList.ToArray());
if (sw.Elapsed.TotalSeconds < 10.0)
{
// Calculate how long you still have to wait and sleep that long
// You might want to wait 10.5 or 11 seconds just in case the rate
// limiting on the other side isn't perfectly implemented
}
}
https://github.com/thomhurst/EnumerableAsyncProcessor
I've written a library to help with this sort of logic.
Usage would be:
var responses = await AsyncProcessorBuilder.WithItems(items) // Or Extension Method: items.ToAsyncProcessorBuilder()
.SelectAsync(item => ProcessItem(item), CancellationToken.None)
.ProcessInParallel(levelOfParallelism: 10, TimeSpan.FromSeconds(1));

TPL DataFlow Workflow

I have just started reading TPL Dataflow and it is really confusing for me. There are so many articles on this topic which I read but I am unable to digest it easily. May be it is difficult and may be I haven't started to grasp the idea.
The reason why I started looking into this is that I wanted to implement a scenario where parallel tasks could be run but in order and found that TPL Dataflow can be used as this.
I am practicing TPL and TPL Dataflow both and am at very beginners level so I need help from experts who could guide me to the right direction. In the test method written by me I have done the following thing,
private void btnTPLDataFlow_Click(object sender, EventArgs e)
{
Stopwatch watch = new Stopwatch();
watch.Start();
txtOutput.Clear();
ExecutionDataflowBlockOptions execOptions = new ExecutionDataflowBlockOptions();
execOptions.MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded;
ActionBlock<string> actionBlock = new ActionBlock<string>(async v =>
{
await Task.Delay(200);
await Task.Factory.StartNew(
() => txtOutput.Text += v + Environment.NewLine,
CancellationToken.None,
TaskCreationOptions.None,
scheduler
);
}, execOptions);
for (int i = 1; i < 101; i++)
{
actionBlock.Post(i.ToString());
}
actionBlock.Complete();
watch.Stop();
lblTPLDataFlow.Text = Convert.ToString(watch.ElapsedMilliseconds / 1000);
}
Now the procedure is parallel and both asynchronous (not freezing my UI) but the output generated is not in order whereas I have read that TPL Dataflow keeps the order of the elements by default. So my guess is that, then the Task which I have created is the culprit and it is not output the string in correct order. Am I right?
If this is the case then how do I make this Asynchronous and in order both?
I have tried to separate the code and tried to distribute the code in to different methods but my this try is failed as only string is output to textbox and nothing else happened.
private async void btnTPLDataFlow_Click(object sender, EventArgs e)
{
Stopwatch watch = new Stopwatch();
watch.Start();
await TPLDataFlowOperation();
watch.Stop();
lblTPLDataFlow.Text = Convert.ToString(watch.ElapsedMilliseconds / 1000);
}
public async Task TPLDataFlowOperation()
{
var actionBlock = new ActionBlock<int>(async values => txtOutput.Text += await ProcessValues(values) + Environment.NewLine,
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded, TaskScheduler = scheduler });
for (int i = 1; i < 101; i++)
{
actionBlock.Post(i);
}
actionBlock.Complete();
await actionBlock.Completion;
}
private async Task<string> ProcessValues(int i)
{
await Task.Delay(200);
return "Test " + i;
}
I know I have written a bad piece of code but this is the first time I am experimenting with TPL Dataflow.
How do I make this Asynchronous and in order?
This is something of a contradiction. You can make concurrent tasks start in order, but you can't really guarantee that they will run or complete in order.
Let's examine your code and see what's happening.
First, you've selected DataflowBlockOptions.Unbounded. This tells TPL Dataflow that it shouldn't limit the number of tasks that it allows to run concurrently. Therefore, each of your tasks will start at more-or-less the same time, in order.
Your asynchronous operation begins with await Task.Delay(200). This will cause your method to be suspended and then resume after about 200 ms. However, this delay is not exact, and will vary from one invocation to the next. Also, the mechanism by which your code is resumed after the delay may presumably take a variable amount of time. Because of this random variation in the actual delay, then next bit of code to run is now not in order—resulting in the discrepancy you're seeing.
You might find this example interesting. It's a console application to simplify things a bit.
class Program
{
static void Main(string[] args)
{
OutputNumbersWithDataflow();
OutputNumbersWithParallelLinq();
Console.ReadLine();
}
private static async Task HandleStringAsync(string s)
{
await Task.Delay(200);
Console.WriteLine("Handled {0}.", s);
}
private static void OutputNumbersWithDataflow()
{
var block = new ActionBlock<string>(
HandleStringAsync,
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });
for (int i = 0; i < 20; i++)
{
block.Post(i.ToString());
}
block.Complete();
block.Completion.Wait();
}
private static string HandleString(string s)
{
// Perform some computation on s...
Thread.Sleep(200);
return s;
}
private static void OutputNumbersWithParallelLinq()
{
var myNumbers = Enumerable.Range(0, 20).AsParallel()
.AsOrdered()
.WithExecutionMode(ParallelExecutionMode.ForceParallelism)
.WithMergeOptions(ParallelMergeOptions.NotBuffered);
var processed = from i in myNumbers
select HandleString(i.ToString());
foreach (var s in processed)
{
Console.WriteLine(s);
}
}
}
The first set of numbers is calculated using a method rather similar to yours—with TPL Dataflow. The numbers are out-of-order.
The second set of numbers, output by OutputNumbersWithParallelLinq(), doesn't use Dataflow at all. It relies on the Parallel LINQ features built into .NET. This runs my HandleString() method on background threads, but keeps the data in order through to the end.
The limitation here is that PLINQ doesn't let you supply an async method. (Well, you could, but it wouldn't give you the desired behavior.) HandleString() is a conventional synchronous method; it just gets executed on a background thread.
And here's a more complex Dataflow example that does preserve the correct order:
private static void OutputNumbersWithDataflowTransformBlock()
{
Random r = new Random();
var transformBlock = new TransformBlock<string, string>(
async s =>
{
// Make the delay extra random, just to be sure.
await Task.Delay(160 + r.Next(80));
return s;
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });
// For a GUI application you should also set the
// scheduler here to make sure the output happens
// on the correct thread.
var outputBlock = new ActionBlock<string>(
s => Console.WriteLine("Handled {0}.", s),
new ExecutionDataflowBlockOptions
{
SingleProducerConstrained = true,
MaxDegreeOfParallelism = 1
});
transformBlock.LinkTo(outputBlock, new DataflowLinkOptions { PropagateCompletion = true });
for (int i = 0; i < 20; i++)
{
transformBlock.Post(i.ToString());
}
transformBlock.Complete();
outputBlock.Completion.Wait();
}

Control number of threads (not Task)

i have a list of files, every file that i need to run (PCAP file - transmit the packets) has its own running time.
because i want the option to handle more than 1 file parallel i am using this function that get IEnumerable<string> source and MAX number of parallel threads:
public void doWork(IEnumerable<string> _source, int parallelThreads)
{
_tokenSource = new CancellationTokenSource();
var token = _tokenSource.Token;
Task.Factory.StartNew(() =>
{
try
{
Parallel.ForEach(_source,
new ParallelOptions
{
MaxDegreeOfParallelism = parallelThreads //limit number of parallel threads
},
file =>
{
if (token.IsCancellationRequested)
return;
//process my file...
});
}
catch (Exception)
{ }
}, _tokenSource.Token).ContinueWith(
t =>
{
//finish all the list...
}
, TaskScheduler.FromCurrentSynchronizationContext() //to ContinueWith (update UI) from UI thread
);
}
for example if i have list of 10 files and my max number of parallel threads is 4 so my program start to transmit 4 files parallel and after the first file finish another 1 file automatic start and this is works fine if i transmit all my list 1 time.
after added the option to play all the list in loop i have a problem, if i want to play for example all the list twice, after the first loop end the second begin and in this loop after the first file finish all the UI stuck and not respond.
i have talk with friend of mind that he is C# developer and he told me that is probably Task known issue that sometimes get into deadlock.
is it possible to use another Class instead of Task ?
You shouldn't be using Parallel.ForEach for file IO. It is not a cpu intensive task. You should be able to just start all the tasks one after the other. This way you will use much less threads and your application will be more scalable.
updated example:
public static void doWork(IEnumerable<string> _source, int numThreads)
{
var _tokenSource = new CancellationTokenSource();
List<Task> tasksToProcess = new List<Task>();
foreach (var file in _source)
{
tasksToProcess.Add( Task.Factory.StartNew(() =>
{
Console.WriteLine("Processing " + file );
//do file operation
Thread.Sleep(5000);
Console.WriteLine("Finished Processing " + file);
},
_tokenSource.Token));
if(tasksToProcess.Count % numThreads == 0)
{
Console.WriteLine("Waiting for tasks");
Task.WaitAll(tasksToProcess.ToArray(), _tokenSource.Token);
Console.WriteLine("All tasks finished");
tasksToProcess.Clear();
}
}
}
void Main()
{
var fileList = Enumerable.Range(0, 100).Select (e => "file" + e.ToString());
doWork(fileList, 4);
Console.ReadLine();
}

Categories