Throttle WebRequests

Throttle WebRequests - c#

I want to execute a bunch of WebRequests, but set a threshold on how many can be started simultaneously.
I came across this LimitedConcurrencyTaskScheduler example and tried to utilize it like so
scheduler = new LimitedConcurrencyLevelTaskScheduler(1);
taskFactory = new TaskFactory(scheduler);
...
private Task<WebResponse> GetThrottledWebResponse(WebRequest request)
{
return taskFactory.FromAsync<WebResponse>(request.BeginGetResponse, request.EndGetResponse, null);
}
However I noticed that even with a max concurrency of 1, my tasks seemed to be completing in a non-FIFO order. When I put breakpoints in LimitedConcurrencyLevelTaskScheduler, it became apparent that it's not being used at all. I guess the way I'm using TaskFactory.FromAsync is not doing what I had expected.
Is there a proper way to throttle simultaneous WebRequests?

When I put breakpoints in LimitedConcurrencyLevelTaskScheduler, it became apparent that it's not being used at all
That is correct. FromAsync doesn't use the TaskFactory at all. In fact, I don't really understand why this method isn't static.
You have multiple ways to implement the throttling. You could use the ActionBlock From Microsoft.Tpl.Dataflow. Or you could make your own using SemaphoreSlim:
private static readonly SemaphoreSlim Semaphore = new SemaphoreSlim(1);
private static async Task<WebResponse> GetThrottledWebResponse(WebRequest request)
{
await Semaphore.WaitAsync().ConfigureAwait(false);
try
{
return await request.GetResponseAsync().ConfigureAwait(false);
}
finally
{
Semaphore.Release();
}
}

Related

Pause parallel execution of an asynchronous method

I'm coding my own HttpClient that should Handle HTTP - 429 (TooManyRequests) responses. I'm executing a single method in the client in parallel. As soon as I get a 429 StatusCode as a response, I would like to pause the execution of all Tasks, that are currently calling the method.
Currently, I'm using very old code from an old MS DevBlog: PauseToken/Source
private readonly HttpClient _client;
private readonly PauseTokenSource PauseSource;
private readonly PauseToken PauseToken;
public MyHttpClient(HttpClient client)
{
_client = client;
PauseSource = new();
PauseToken = PauseSource.Token;
}
public async Task<HttpResponseMessage> PostAsJsonAsync<TValue>(string? requestUri?, TValue value, CancellationToken cancellationToken = default)
{
try
{
await PauseToken.WaitWhilePausedAsync(); // I'd really like to pass the cancellationToken as well
HttpResponseMessage result = await _client.PostAsJsonAsync(requestUri, value, cancellationToken).ConfigureAwait(false);
if (result.StatusCodes == HttpStatusCode.TooManyRequests)
{
PauseSource.IsPaused = true;
TimeSpan delay = (result.Headers.RetryAfter?.Date - DateTimeOffset.UtcNow) ?? TimeSpan.Zero;
await Task.Delay(delay, cancellationToken);
PauseSource.IsPaused = false;
return await PostAsJsonAsync(requestUri, value, cancellationToken);
}
return result;
}
finally
{
PauseSource.IsPaused = false;
}
}
MyHttpClient.PostAsJsonAsync is called like this:
private readonly MyHttpClient _client; // This gets injected by the constructor DI
private string ApiUrl; // This as well
public async Task SendToAPIAsync<T>(IEnumerable<T> items, CancellationToken cancellationToken = default)
{
IEnumerable<Task<T>> tasks = items.Select(item =>
_client.PostAsJsonAsync(ApiUrl, item, cancellationToken));
await Task.WhenAll(tasks).ConfigureAwait(false);
}
The items collection will contain 15'000 - 25'000 items. The API is unfortunately built so I have to make 1 request for each item.
I really dislike using old code like this, since I honestly don't even know what it does under the hood (the entire source code can be looked at in the linked article above). Also, I'd like to pass my cancellationToken to the WaitWhilePausedAsync() method since execution should be able to be cancelled at any time.
Is there really no easy way to "pause an async method"?
I've tried to store the DateTimeOffset I get from the result->RetryAfter in a local field, then just simply Task.Delay() the delta to DateTimeOffset.UtcNow, but that didn't seem to work and I also don't think it's very performant.
I like the idea of having a PauseToken but I think there might be better ways to do this nowadays.

I really dislike using old code like this
Just because code is old does not necessarily mean it is bad.
Also, I'd like to pass my cancellationToken to the WaitWhilePausedAsync() method since execution should be able to be cancelled at any time
As far as I can tell, the WaitWhilePausedAsync just returns a task, If you want to abort as soon as the cancellation token is cancelled you could use this answer for an WaitOrCancel extension, used like:
try{
await PauseToken.WaitWhilePausedAsync().WaitOrCancel(cancellationToken );
}
catch(OperationCancelledException()){
// handle cancel
}
Is there really no easy way to "pause an async method"?
To 'pause and async method' should mean we need to await something, since we probably want to avoid blocking. That something need to be a Task, so such a method would probably involve creating a TaskCompletionSource that can be awaited, that completes when unpaused. That seem to be more or less what your PauseToken does.
Note that any type of 'pausing' or 'cancellation' need to be done cooperatively, so any pause feature need to be built, and probably need to be built by you if you are implementing your own client.
But there are might be alternative solutions. Maybe use a SemaphoreSlim for rate-limiting? Maybe just delay the request a bit if you get a ToManyRequests error? Maybe use a central queue of requests that can be throttled?

I ultimately created a library that contains a HttpClientHandler which handles these results for me. For anyone interested, here's the repo: github.com/baltermia/too-many-requests-handler (the NuGet package is linked in the readme).
A comment above led me to the solution below. I used the github.com/StephenCleary/AsyncEx library, that both has PauseTokenSource and the AsyncLock types which provided the functionality I was searching for.
private readonly AsyncLock _asyncLock = new();
private readonly HttpClient _client;
private readonly PauseTokenSource _pauseSource = new();
public PauseToken PauseToken { get; }
public MyHttpClient(HttpClient client)
{
_client = client;
PauseToken = _pauseSource.Token;
}
public async Task<HttpResponseMessage> PostAsJsonAsync<TValue>(string? requestUri?, TValue value, CancellationToken cancellationToken = default)
{
{
// check if requests are paused and wait
await PauseToken.WaitWhilePausedAsync(cancellationToken).ConfigureAwait(false);
HttpResponseMessage result = await _client.PostAsJsonAsync(requestUri, value, cancellationToken).ConfigureAwait(false);
// if result is anything but 429, return (even if it may is an error)
if (result.StatusCode != HttpStatusCode.TooManyRequests)
return result;
// create a locker which will unlock at the end of the stack
using IDisposable locker = await _asyncLock.LockAsync(cancellationToken).ConfigureAwait(false);
// calculate delay
DateTimeOffset? time = result.Headers.RetryAfter?.Date;
TimeSpan delay = time - DateTimeOffset.UtcNow ?? TimeSpan.Zero;
// if delay is 0 or below, return new requests
if (delay <= TimeSpan.Zero)
{
// very important to unlock
locker.Dispose();
// recursively recall itself
return await PostAsJsonAsync(requestUri, value, cancellationToken).ConfigureAwait(false);
}
try
{
// otherwise pause requests
_pauseSource.IsPaused = true;
// then wait the calculated delay
await Task.Delay(delay, cancellationToken).ConfigureAwait(false);
}
finally
{
_pauseSource.IsPaused = false;
}
// make sure to unlock again (otherwise the method would lock itself because of recursion)
locker.Dispose();
// recursively recall itself
return await PostAsJsonAsync(requestUri, value, cancellationToken).ConfigureAwait(false);
}
}

Run multiple asynchronous Tasks continuously

I would like to implement a pool of a predetermined number (let's say 10) asynchronous tasks running undefinitely.
Using Task.WhenAll, I can easily start 10 tasks, feed them into a list, and call await Task.WhenAll(list) on this list. Once the method comes back, I can start again the whole process on the next 10 elements. The problem I face with this solution is that it waits for the longest task to complete before looping, which is not optimal.
What I would like is that anytime a task is completed, a new one is started. A timeout would be great as well, to prevent a task from being run undefinitely in case of a failure.
Is there any simple way of doing this?

What I would like is that anytime a task is completed, a new one is started.
This is a perfect use case for SemaphoreSlim:
private readonly SemaphoreSlim _mutex = new SemaphoreSlim(10);
public async Task AddTask(Func<Task> work)
{
await _mutex.WaitAsync();
try { await work(); }
finally { _mutex.Release(); }
}
A timeout would be great as well, to prevent a task from being run undefinitely in case of a failure.
The standard pattern for timeouts is to use a CancellationTokenSource as the timer and pass a CancellationToken into the work that needs to support cancellation:
private readonly SemaphoreSlim _mutex = new SemaphoreSlim(10);
public async Task AddTask(Func<CancellationToken, Task> work)
{
await _mutex.WaitAsync();
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(10));
try { await work(cts.Token); }
finally { _mutex.Release(); }
}

Locking issue with LimitedConcurrencyLevelTaskScheduler and aync/await

I'm struggling to understand what's happening in this simple program.
In the example below I have a task factory that uses the LimitedConcurrencyLevelTaskScheduler from ParallelExtensionsExtras with maxDegreeOfParallelism set to 2.
I then start 2 tasks that each call an async method (e.g. an async Http request), then gets the awaiter and the result of the completed task.
The problem seem to be that Task.Delay(2000) never completes. If I set maxDegreeOfParallelism to 3 (or greater) it completes. But with maxDegreeOfParallelism = 2 (or less) my guess is that there is no thread available to complete the task. Why is that?
It seems to be related to async/await since if I remove it and simply do Task.Delay(2000).GetAwaiter().GetResult() in DoWork it works perfectly. Does async/await somehow use the parent task's task scheduler, or how is it connected?
using System;
using System.Linq;
using System.Threading.Tasks;
using System.Threading.Tasks.Schedulers;
namespace LimitedConcurrency
{
class Program
{
static void Main(string[] args)
{
var test = new TaskSchedulerTest();
test.Run();
}
}
class TaskSchedulerTest
{
public void Run()
{
var scheduler = new LimitedConcurrencyLevelTaskScheduler(2);
var taskFactory = new TaskFactory(scheduler);
var tasks = Enumerable.Range(1, 2).Select(id => taskFactory.StartNew(() => DoWork(id)));
Task.WaitAll(tasks.ToArray());
}
private void DoWork(int id)
{
Console.WriteLine($"Starting Work {id}");
HttpClientGetAsync().GetAwaiter().GetResult();
Console.WriteLine($"Finished Work {id}");
}
async Task HttpClientGetAsync()
{
await Task.Delay(2000);
}
}
}
Thanks in advance for any help

await by default captures the current context and uses that to resume the async method. This context is SynchronizationContext.Current, unless it is null, in which case it is TaskScheduler.Current.
In this case, await is capturing the LimitedConcurrencyLevelTaskScheduler used to execute DoWork. So, after starting the Task.Delay both times, both of those threads are blocked (due to the GetAwaiter().GetResult()). When the Task.Delay completes, the await schedules the remainder of the HttpClientGetAsync method to its context. However, the context will not run it since it already has 2 threads.
So you end up with threads blocked in the context until their async methods complete, but the async methods cannot complete until there is a free thread in the context; thus a deadlock. Very similar to the standard "don't block on async code" style of deadlock, just with n threads instead of one.
Clarifications:
The problem seem to be that Task.Delay(2000) never completes.
Task.Delay is completing, but the await cannot continue executing the async method.
If I set maxDegreeOfParallelism to 3 (or greater) it completes. But with maxDegreeOfParallelism = 2 (or less) my guess is that there is no thread available to complete the task. Why is that?
There are plenty of threads available. But the LimitedConcurrencyTaskScheduler only allows 2 threads at a time to run in its context.
It seems to be related to async/await since if I remove it and simply do Task.Delay(2000).GetAwaiter().GetResult() in DoWork it works perfectly.
Yes; it's the await that is capturing the context. Task.Delay does not capture a context internally, so it can complete without needing to enter the LimitedConcurrencyTaskScheduler.
Solution:
Task schedulers in general do not work very well with asynchronous code. This is because task schedulers were designed for Parallel Tasks rather than asynchronous tasks. So they only apply when code is running (or blocked). In this case, LimitedConcurrencyLevelTaskScheduler only "counts" code that's running; if you have a method that's doing an await, it won't "count" against that concurrency limit.
So, your code has ended up in a situation where it has the sync-over-async antipattern, probably because someone was trying to avoid the problem of await not working as expected with limited concurrency task schedulers. This sync-over-async antipattern has then caused the deadlock problem.
Now, you could add in more hacks by using ConfigureAwait(false) everywhere and continue blocking on asynchronous code, or you could fix it better.
A more proper fix would be to do asynchronous throttling. Toss out the LimitedConcurrencyLevelTaskScheduler completely; concurrency-limiting task schedulers only work with synchronous code, and your code is asynchronous. You can do asynchronous throttling using SemaphoreSlim, as such:
class TaskSchedulerTest
{
private readonly SemaphoreSlim _mutex = new SemaphoreSlim(2);
public async Task RunAsync()
{
var tasks = Enumerable.Range(1, 2).Select(id => DoWorkAsync(id));
await Task.WhenAll(tasks);
}
private async Task DoWorkAsync(int id)
{
await _mutex.WaitAsync();
try
{
Console.WriteLine($"Starting Work {id}");
await HttpClientGetAsync();
Console.WriteLine($"Finished Work {id}");
}
finally
{
_mutex.Release();
}
}
async Task HttpClientGetAsync()
{
await Task.Delay(2000);
}
}

I think you are encountering a sync deadlock. You are waiting for a thread to complete that is waiting for your thread to complete. Never going to happen. If you make your DoWork method async so you can await the HttpClientGetAsync() call, and you'll avoid the deadlock.
using MassTransit.Util;
using System;
using System.Linq;
using System.Threading.Tasks;
//using System.Threading.Tasks.Schedulers;
namespace LimitedConcurrency
{
class Program
{
static void Main(string[] args)
{
var test = new TaskSchedulerTest();
test.Run();
}
}
class TaskSchedulerTest
{
public void Run()
{
var scheduler = new LimitedConcurrencyLevelTaskScheduler(2);
var taskFactory = new TaskFactory(scheduler);
var tasks = Enumerable.Range(1, 2).Select(id => taskFactory.StartNew(() => DoWork(id)));
Task.WaitAll(tasks.ToArray());
}
private async Task DoWork(int id)
{
Console.WriteLine($"Starting Work {id}");
await HttpClientGetAsync();
Console.WriteLine($"Finished Work {id}");
}
async Task HttpClientGetAsync()
{
await Task.Delay(2000);
}
}
}
https://medium.com/rubrikkgroup/understanding-async-avoiding-deadlocks-e41f8f2c6f5d
TLDR never call .result, which I'm sure .GetResult(); was doing

Producer/consumer pattern in .NET with the ability to wait for submitted tasks/jobs to complete

I am trying to get my head around the following design, but fail to get a clear picture.
I have a number of producers submitting tasks/jobs to a queue. A consumer/worker would then pick these up and complete on these. For now, there is only one consumer/worker.
So far, this sounds like the standard producer/consumer pattern which could be done with a BlockingCollection.
However, some producers might want to submit a task/job and be able to wait for its completion (or submit multiple tasks/jobs and wait for some or all of them, etc.), while other producers would just "fire&forget" their tasks/jobs.
(Note that this is not waiting for the queue to be empty, but waiting for a particular task/job).
How would this be done? In all examples I have seen, producers just post data to the queue using BlockingQueue.Add().
Any help would be highly appreciated.

A common approach is to wrap your work operations using a TaskCompletionSource whose Task can be returned to the caller and awaited on for completion.
public class ProducerConsumerQueue
{
private readonly BlockingCollection<Action> queue = new BlockingCollection<Action>();
public Task Produce(Action work)
{
var tcs = new TaskCompletionSource<bool>();
Action action = () =>
{
try
{
work();
tcs.SetResult(true);
}
catch (Exception ex)
{
tcs.SetException(ex);
}
};
queue.Add(action);
return tcs.Task;
}
public void RunConsumer(CancellationToken token)
{
while (true)
{
token.ThrowIfCancellationRequested();
var action = queue.Take(token);
action();
}
}
}
That said, you should consider leveraging the task infrastructure provided by TPL itself, rather than coming up with your own structures. If your only requirement is having a bounded number of consumers, you could use a LimitedConcurrencyLevelTaskScheduler.

Is there a way to use Task<T> as a waithandle for a future value T?

I'd like to use Task return from a method to return a the value when it becomes available at later time, so that the caller can either block using Wait or attach a continuation or even await it. The best I can think of is this:
public class Future<T> {
private ManualResetEvent mre = new ManualResetEvent();
private T value;
public async Task<T> GetTask() {
mre.WaitOne();
return value;
}
public void Return(T value) {
this.value = value;
mre.Set();
}
}
Main problem with that is that mre.WaitOne() is blocking, so i assume that every call to GetTask() will schedule a new thread to block. Is there a way to await a WaitHandle in an async manner or is there already a helper for building the equivalent functionality?
Edit: Ok, is TaskCompletionSource what i'm looking for and i'm just making life hard on myself?

Well, I guess I should have dug around a bit more before posting. TaskCompletionSource is exactly what I was looking for
var tcs = new TaskCompletionSource<int>();
bool completed = false;
var tc = tcs.Task.ContinueWith(t => completed = t.IsCompleted);
tcs.SetResult(1);
tc.Wait();
Assert.IsTrue(completed);

Blocking a thread is bad by calling the WaitHandle.WaitOne(), but that's how events work, and the selected answer does not make a lot of sense, because it does not do anything asynchronously.
HOWEVER, the .NET framework can utilize worker threads from a thread pool to wait on multiple events at the same time (see ThreadPool.RegisterWaitForSingleObject) - this will improve overall resource utilization in your app if you need to wait on multiple WaitHandles at the same time.
So what you can do, is to register the WaitHandle for waiting on a worker thread, and set the callback with desired continuation.
With the AsyncWaitHandle extension library (NuGet) this can be done in one line:
var mre = new ManualResetEvent();
T myValue = ...;
Task<T> futureValueTask = mre.WaitOneAsync().ContinueWith(() => myValue);
Overall, my humble suggestion is to review the code and do this instead:
async Task MyCode()
{
var mre = new ManualResetEvent();
StartDoingSmthAsynchronouslyThatPulsesTheEvent(mre);
await mre;
// finish routine here when the event fires
}

Can't you just leverage TaskEx.WhenAll(t1, t2, t3...) to do the waiting. You'd need a real task that represents the doing of the work, but if you said something like:
private Task<T> myRealWork;
private T value;
// ...
public async Task<T> GetTask()
{
value = await TaskEx.WhenAll(myRealWork);
return value;
}
Although you can probably just await the myRealWork task as well. I don't see in your code where the value is actually being computed. That might require something like:
public async Task<T> GetTask()
{
value = await TaskEx.RunEx(() => ComputeRealValueAndReturnIt());
return value;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Throttle WebRequests - c#

Related

Pause parallel execution of an asynchronous method

Run multiple asynchronous Tasks continuously

Locking issue with LimitedConcurrencyLevelTaskScheduler and aync/await

Producer/consumer pattern in .NET with the ability to wait for submitted tasks/jobs to complete

Is there a way to use Task<T> as a waithandle for a future value T?

Categories

Resources