I'm coding my own HttpClient that should Handle HTTP - 429 (TooManyRequests) responses. I'm executing a single method in the client in parallel. As soon as I get a 429 StatusCode as a response, I would like to pause the execution of all Tasks, that are currently calling the method.
Currently, I'm using very old code from an old MS DevBlog: PauseToken/Source
private readonly HttpClient _client;
private readonly PauseTokenSource PauseSource;
private readonly PauseToken PauseToken;
public MyHttpClient(HttpClient client)
{
_client = client;
PauseSource = new();
PauseToken = PauseSource.Token;
}
public async Task<HttpResponseMessage> PostAsJsonAsync<TValue>(string? requestUri?, TValue value, CancellationToken cancellationToken = default)
{
try
{
await PauseToken.WaitWhilePausedAsync(); // I'd really like to pass the cancellationToken as well
HttpResponseMessage result = await _client.PostAsJsonAsync(requestUri, value, cancellationToken).ConfigureAwait(false);
if (result.StatusCodes == HttpStatusCode.TooManyRequests)
{
PauseSource.IsPaused = true;
TimeSpan delay = (result.Headers.RetryAfter?.Date - DateTimeOffset.UtcNow) ?? TimeSpan.Zero;
await Task.Delay(delay, cancellationToken);
PauseSource.IsPaused = false;
return await PostAsJsonAsync(requestUri, value, cancellationToken);
}
return result;
}
finally
{
PauseSource.IsPaused = false;
}
}
MyHttpClient.PostAsJsonAsync is called like this:
private readonly MyHttpClient _client; // This gets injected by the constructor DI
private string ApiUrl; // This as well
public async Task SendToAPIAsync<T>(IEnumerable<T> items, CancellationToken cancellationToken = default)
{
IEnumerable<Task<T>> tasks = items.Select(item =>
_client.PostAsJsonAsync(ApiUrl, item, cancellationToken));
await Task.WhenAll(tasks).ConfigureAwait(false);
}
The items collection will contain 15'000 - 25'000 items. The API is unfortunately built so I have to make 1 request for each item.
I really dislike using old code like this, since I honestly don't even know what it does under the hood (the entire source code can be looked at in the linked article above). Also, I'd like to pass my cancellationToken to the WaitWhilePausedAsync() method since execution should be able to be cancelled at any time.
Is there really no easy way to "pause an async method"?
I've tried to store the DateTimeOffset I get from the result->RetryAfter in a local field, then just simply Task.Delay() the delta to DateTimeOffset.UtcNow, but that didn't seem to work and I also don't think it's very performant.
I like the idea of having a PauseToken but I think there might be better ways to do this nowadays.
I really dislike using old code like this
Just because code is old does not necessarily mean it is bad.
Also, I'd like to pass my cancellationToken to the WaitWhilePausedAsync() method since execution should be able to be cancelled at any time
As far as I can tell, the WaitWhilePausedAsync just returns a task, If you want to abort as soon as the cancellation token is cancelled you could use this answer for an WaitOrCancel extension, used like:
try{
await PauseToken.WaitWhilePausedAsync().WaitOrCancel(cancellationToken );
}
catch(OperationCancelledException()){
// handle cancel
}
Is there really no easy way to "pause an async method"?
To 'pause and async method' should mean we need to await something, since we probably want to avoid blocking. That something need to be a Task, so such a method would probably involve creating a TaskCompletionSource that can be awaited, that completes when unpaused. That seem to be more or less what your PauseToken does.
Note that any type of 'pausing' or 'cancellation' need to be done cooperatively, so any pause feature need to be built, and probably need to be built by you if you are implementing your own client.
But there are might be alternative solutions. Maybe use a SemaphoreSlim for rate-limiting? Maybe just delay the request a bit if you get a ToManyRequests error? Maybe use a central queue of requests that can be throttled?
I ultimately created a library that contains a HttpClientHandler which handles these results for me. For anyone interested, here's the repo: github.com/baltermia/too-many-requests-handler (the NuGet package is linked in the readme).
A comment above led me to the solution below. I used the github.com/StephenCleary/AsyncEx library, that both has PauseTokenSource and the AsyncLock types which provided the functionality I was searching for.
private readonly AsyncLock _asyncLock = new();
private readonly HttpClient _client;
private readonly PauseTokenSource _pauseSource = new();
public PauseToken PauseToken { get; }
public MyHttpClient(HttpClient client)
{
_client = client;
PauseToken = _pauseSource.Token;
}
public async Task<HttpResponseMessage> PostAsJsonAsync<TValue>(string? requestUri?, TValue value, CancellationToken cancellationToken = default)
{
{
// check if requests are paused and wait
await PauseToken.WaitWhilePausedAsync(cancellationToken).ConfigureAwait(false);
HttpResponseMessage result = await _client.PostAsJsonAsync(requestUri, value, cancellationToken).ConfigureAwait(false);
// if result is anything but 429, return (even if it may is an error)
if (result.StatusCode != HttpStatusCode.TooManyRequests)
return result;
// create a locker which will unlock at the end of the stack
using IDisposable locker = await _asyncLock.LockAsync(cancellationToken).ConfigureAwait(false);
// calculate delay
DateTimeOffset? time = result.Headers.RetryAfter?.Date;
TimeSpan delay = time - DateTimeOffset.UtcNow ?? TimeSpan.Zero;
// if delay is 0 or below, return new requests
if (delay <= TimeSpan.Zero)
{
// very important to unlock
locker.Dispose();
// recursively recall itself
return await PostAsJsonAsync(requestUri, value, cancellationToken).ConfigureAwait(false);
}
try
{
// otherwise pause requests
_pauseSource.IsPaused = true;
// then wait the calculated delay
await Task.Delay(delay, cancellationToken).ConfigureAwait(false);
}
finally
{
_pauseSource.IsPaused = false;
}
// make sure to unlock again (otherwise the method would lock itself because of recursion)
locker.Dispose();
// recursively recall itself
return await PostAsJsonAsync(requestUri, value, cancellationToken).ConfigureAwait(false);
}
}
Related
I was recently exposed to C# language and was working on getting data out of cassandra so I was working with below code which gets data from Cassandra and it works fine.
Only problem I have is in my ProcessCassQuery method - I am passing CancellationToken.None to my requestExecuter Function which might not be the right thing to do. What should be the right way to handle that case and what should I do to handle it correctly?
/**
*
* Below method does multiple async calls on each table for their corresponding id's by limiting it down using Semaphore.
*
*/
private async Task<List<T>> ProcessCassQueries<T>(IList<int> ids, Func<CancellationToken, int, Task<T>> mapperFunc, string msg) where T : class
{
var tasks = ids.Select(async id =>
{
await semaphore.WaitAsync();
try
{
ProcessCassQuery(ct => mapperFunc(ct, id), msg);
}
finally
{
semaphore.Release();
}
});
return (await Task.WhenAll(tasks)).Where(e => e != null).ToList();
}
// this might not be good idea to do it. how can I improve below method?
private Task<T> ProcessCassQuery<T>(Func<CancellationToken, Task<T>> requestExecuter, string msg) where T : class
{
return requestExecuter(CancellationToken.None);
}
As said in the official documentation, the cancellation token allows propagating a cancellation signal. This can be useful for example, to cancel long-running operations that for some reason do not make sense anymore or that are simply taking too long.
The CancelationTokenSource will allow you to get a custom token that you can pass to the requestExecutor. It will also provide the means for cancelling a running Task.
private CancellationTokenSource cts = new CancellationTokenSource();
// ...
private Task<T> ProcessCassQuery<T>(Func<CancellationToken, Task<T>> requestExecuter, string msg) where T : class
{
return requestExecuter(cts.Token);
}
Example
Let's take a look at a different minimal/dummy example so we can look at the inside of it.
Consider the following method, GetSomethingAsync that will yield return an incrementing integer every second.
The call to token.ThrowIfCancellationRequested will make sure a TaskCanceledException is thrown if this process is cancelled by an outside action. Other approaches can be taken, for example, check if token.IsCancellationRequested is true and do something about it.
private static async IAsyncEnumerable<int> GetSomethingAsync(CancellationToken token)
{
Console.WriteLine("starting to get something");
token.ThrowIfCancellationRequested();
for (var i = 0; i < 100; i++)
{
await Task.Delay(1000, token);
yield return i;
}
Console.WriteLine("finished getting something");
}
Now let's build the main method to call the above method.
public static async Task Main()
{
var cts = new CancellationTokenSource();
// cancel it after 3 seconds, just for demo purposes
cts.CancelAfter(3000);
// or: Task.Delay(3000).ContinueWith(_ => { cts.Cancel(); });
await foreach (var i in GetSomethingAsync(cts.Token))
{
Console.WriteLine(i);
}
}
If we run this, we will get an output that should look like:
starting to get something
0
1
Unhandled exception. System.Threading.Tasks.TaskCanceledException: A task was canceled.
Of course, this is just a dummy example, the cancellation could be triggered by a user action, or some event that happens, it does not have to be a timer.
The following code gets a list of investments belonging to a customer from 3 different resources. The flow starts with a controller's call and follows the flow described below where all methods are declared as async and called with await operator.
I'm wondering if is there a problem making all methods as async. Is there any performance penalty? Is it a code smell or an anti-pattern?
I know there are things that must be waited like access url, get data from cahce, etc. But I think there are things like filling a list or sum some few values doesn't need to be async.
Below follow the code (some parts where ommited for clearness):
Controller
{HttpGet]
public async Task<IActionResult> Get()
{
Client client = await _mediator.Send(new RecuperarInvestimentosQuery());
return Ok(cliente);
}
QueryHandler
public async Task<Client> Handle(RecoverInvestimentsQuery request, CancellationToken cancellationToken)
{
Client client;
List<Investiment> list = await _investimentBuilder.GetInvestiments();
client = new Cliente(request.Id, list);
return client;
}
InvestmentBuilder
public async Task<List<Investiment>> GetInvestiments()
{
ListInvestiments builder = new ListInvestiments();
await builder.BuildLists(_builder);
// here I get the List<Investiment> list already fulfilled to return to the controller
return list;
}
BuildLists
public async Task BuildLists(IBuilder builder)
{
Task[] tasks = new Task[] {
builder.GetFundsAsync(), //****
builder.ObterTesouro(),
builder.ObterRendaFixa()
};
await Task.WhenAll(tasks);
}
Funds, Bonds and Fixed Income Services (***all 3 methods are equal, only its name vary, so I just put one of them for the sake of saving space)
public async Task GetFundsAsync()
{
var listOfFunds = await _FundsService.RecoverFundsAsync();
// listOfFunds will get all items from all types of investments
}
Recover Funds, Bonds and Fixed Incomes methods are equals too, again I just put one of them
public async Task<List<Funds>> RecoverFundsAsync()
{
var returnCache = await _clientCache.GetValueAsync("fundsService");
// if not in cache, so go get from url
if (returnCache == null)
{
string url = _configuration.GetValue<string>("Urls:Funds");
var response = await _clienteHttp.ObterDadosAsync(url);
if (response != null)
{
string funds = JObject.Parse(response).SelectToken("funds").ToString();
await _clienteCache.SetValueAsync("fundService", funds);
return JsonConvert.DeserializeObject<List<Funds>>(fundos);
}
else
return null;
}
return JsonConvert.DeserializeObject<List<Funds>>(returnCache);
}
HTTP Client
public async Task<string> GetDataAsync(string Url)
{
using (HttpClient client = _clientFactory.CreateClient())
{
var response = await client.GetAsync(Url);
if (response.IsSuccessStatusCode)
return await response.Content.ReadAsStringAsync();
else
return null;
}
}
Cache Client
public async Task<string> GetValueAsync(string key)
{
IDatabase cache = Connection.GetDatabase();
RedisValue value = await cache.StringGetAsync(key);
if (value.HasValue)
return value.ToString();
else
return null;
}
Could someone give a thought about that?
Thanks in advance.
Your code looks okay for me. You are using async and await just for I/O and web access operations, and it perfectly fits for async and await purposes:
For I/O-bound code, you await an operation that returns a Task or Task inside of an async method.
For CPU-bound code, you await an operation that is started on a background thread with the Task.Run method.
Once you've used async and await, then all pieces of your code tends to become asynchronous too. This fact is described greatly in the MSDN article - Async/Await - Best Practices in Asynchronous Programming:
Asynchronous code reminds me of the story of a fellow who mentioned
that the world was suspended in space and was immediately challenged
by an elderly lady claiming that the world rested on the back of a
giant turtle. When the man enquired what the turtle was standing on,
the lady replied, “You’re very clever, young man, but it’s turtles all
the way down!” As you convert synchronous code to asynchronous code,
you’ll find that it works best if asynchronous code calls and is
called by other asynchronous code—all the way down (or “up,” if you
prefer). Others have also noticed the spreading behavior of
asynchronous programming and have called it “contagious” or compared
it to a zombie virus. Whether turtles or zombies, it’s definitely true
that asynchronous code tends to drive surrounding code to also be
asynchronous. This behavior is inherent in all types of asynchronous
programming, not just the new async/await keywords.
I have a thread which is responsible for calling a webapi from 4 websites exactly every 2 seconds. The Webapi call method should not be awaited because if a website is not available it will wait 5 second to get timeout and then the next website call will be delayed.
As HttpClient in .NET 4.7.2 has only async methods , it should be used with await, and if not , compiler gives warning and we may get unexpected behavior (as Microsoft says) .
So should I use Task.Run or call Threadpool.QueueUserWorkItem to make a webapi call in parallel.
Here is sudocode :
public class Test1
{
private AutoResetEvent waitEvent = new AutoResetEvent(false);
private volatile bool _terminated = false;
public void Start()
{
Thread T = new Thread(ProcThread);
T.Start();
}
private async void ProcThread()
{
while (!_terminated)
{
await CallWebApi(); <=========== this line
waitEvent.WaitOne(2000);
}
}
private async Task CallWebApi()
{
HttpClient client = new HttpClient();
.....
.....
}
}
So you have an async procedure that uses a HttpClient to fetch some information and process the fetched data:
async Task CallWebApiAsync() {...}
Improvement 1: it is good practice to suffix async methods with async. This is done to make it possible to let an async version exist next to a non-async version that does something similarly.
Inside this method you are using one of the HttpClient methods to fetch the information. As CallWebApiAsync is awaitable, I assume the async methods are used (GetAsync, GetStreamAsync, etc), and that the method only awaits when it needs the result of the async method.
The nice thing about this is, that as a user of CallWebApiAsync, as long as you don't await the call, you are free to do other things, even if the website isn't reacting. The problem is: after 2 seconds, you want to call the method again. But what to do if the method hasn't finished yet.
Improvement 2 Because you want to be able to start a new Task, while the previous one has not finished: remember the started tasks, and throw them away when finished.
HashSet<Task> activeTasks = new HashSet<Task>(); // efficient add, lookup, and removal
void TaskStarted(Task startedTask)
{
// remember the startedTask
activeTasks.Add(startedTask);
}
void TaskCompleted(Task completedTask)
{
// If desired: log or process the results
LogFinishedTask(completedTask);
// Remove the completedTask from the set of ActiveTasks:
activeTasks.Remove(completedTask);
}
It might be handy to remove all completed tasks at once:
void RemoveCompletedTasks()
{
var completedTasks = activeTasks.Where(task => task.IsCompleted).ToList();
foreach (var task in completedTasks)
{
TaskCompleted(completedTask);
}
}
Now we can adjust your ProcThread.
Improvement 3: in async-await always return Task instead of void and Task<TResult> instead of TResult. Only exception: eventhandlers return void.
async Task ProcThread()
{
// Repeatedly: start a task; remember it, and wait 2 seconds
TimeSpan waitTime = TimeSpan.FromSeconds(2);
while (!terminationRequested)
{
Task taskWebApi = CallWebApiAsync();
// You didn't await, so you are free to do other things
// Remember the task that you started.
this.TaskStarted(taskWebApi);
// wait a while before you start new task:
await Task.Delay(waitTime);
// before starting a new task, remove all completed tasks
this.RemoveCompletedTasks();
}
}
Improvement 4: Use TimeSpan.
TimeSpan.FromSeconds(2) is much easier to understand what it represents than a value 2000.
How to stop?
The problem is of course, after you request termination there might still be some tasks running. You'll have to wait for them to finish. But even then: some tasks might not finish at all within reasonable time.
Improvement 5: use CancellationToken to request cancellation.
To cancel tasks in a neat way, class CancellationToken is invented. Users who start a task create a CancellationTokenSource object, and ask this object for a CancellationToken. This token is passed to all async methods. As soon as the user wants to cancel all tasks that were started using this CancellationTokenSource, he requests the CancellationTokenSource to cancel.
All tasks that have a token from this source have promised to regularly check the token to see if cancellation is requested. If so, the task does some cleanup (if needed) and returns.
Everything summarized in one class:
class Test1
{
private HttpClient httpClient = new HttpClient(...);
private HashSet<TTask> activeTasks = new HashSet<TTask>();
public async Task StartAsync(CancellationToken cancellationToken)
{
// repeated CallWebApiAsync until cancellation is requested
TimeSpan waitTime = TimeSpan.FromSeconds(2);
// repeat the following until OperationCancelled
try
{
while (true))
{
// stop if cancellation requested
cancellationToken.ThrowIfCancellationRequested();
var taskWebApi = this.CallWebApiAsync(cancellationToken);
this.activeTasks.Add(taskWebApi);
await Task.Delay(waitTime, cancellationToken);
// remove all completed tasks:
activeTasks.RemoveWhere(task => task.IsCompleted);
}
}
catch (OperationCanceledException exception)
{
// caller requested to cancel. Wait until all tasks are finished.
await Task.WhenAll(this.activeTasks);
// if desired do some logging for all tasks that were not completed.
}
}
And the adjusted CallWebApiAsync:
private async Task CallWebApiAsync(CancellationToken cancellationToken)
{
const string requestUri = ...
var httpResponseMessage = await this.httpClient.GetAsync(requestUri, cancellationToken);
// if here: cancellation not requested
this.ProcessHttpResponse(httpResponseMessage);
}
private void ProcessHttpRespons(HttpResponseMessage httpResponseMessage)
{
...
}
}
Usage:
CancellationTokenSource cancellationTokenSource = new CancellationTokenSource();
Test1 test = new Test1();
Task taskCallWebApiRepeatedly = test.StartAsync(cancellationTokenSource.Token);
// because you didn't await, you are free to do other things, while WebApi is called
// every 2 seconds
DoSomethingElse();
// you get bored. Request cancellation:
cancellationTokenSource.Cancel();
// of course you need to await until all tasks are finished:
await Task.Wait(taskCallWebApiRepeatedly);
Because everyone promises to check regularly if cancellation is requested, you are certain that within reasonable time all tasks are finished, and have cleaned up their mess. The definition or "reasonable time" is arbitrary, but let's say, less than 100 msec?
If all you want is to execute a method every two seconds, then a System.Timers.Timer is probably the most suitable tool to use:
public class Test1
{
private readonly HttpClient _client;
private readonly System.Timers.Timer _timer;
public Test1()
{
_client = new HttpClient();
_timer = new System.Timers.Timer();
_timer.Interval = 2000;
_timer.Elapsed += Timer_Elapsed;
}
private void Timer_Elapsed(object sender, System.Timers.ElapsedEventArgs e)
{
var fireAndForgetTask = CallWebApiAsync();
}
private async Task CallWebApiAsync()
{
var html = await _client.GetStringAsync("http://example.com");
//...
}
public void Start() => _timer.Start();
public void Stop() => _timer.Stop();
}
something like this. BTW take this as pseudo code as I am typing sitting on my bed:)
List<Task> tasks = new List<Task>();
tasks.Add(CallWebApi());
while (! await Task.WhenAny(tasks))
{
tasks.Add(CallWebApi()); <=========== this line
await Task.Delay(2000);
}
I want to execute a bunch of WebRequests, but set a threshold on how many can be started simultaneously.
I came across this LimitedConcurrencyTaskScheduler example and tried to utilize it like so
scheduler = new LimitedConcurrencyLevelTaskScheduler(1);
taskFactory = new TaskFactory(scheduler);
...
private Task<WebResponse> GetThrottledWebResponse(WebRequest request)
{
return taskFactory.FromAsync<WebResponse>(request.BeginGetResponse, request.EndGetResponse, null);
}
However I noticed that even with a max concurrency of 1, my tasks seemed to be completing in a non-FIFO order. When I put breakpoints in LimitedConcurrencyLevelTaskScheduler, it became apparent that it's not being used at all. I guess the way I'm using TaskFactory.FromAsync is not doing what I had expected.
Is there a proper way to throttle simultaneous WebRequests?
When I put breakpoints in LimitedConcurrencyLevelTaskScheduler, it became apparent that it's not being used at all
That is correct. FromAsync doesn't use the TaskFactory at all. In fact, I don't really understand why this method isn't static.
You have multiple ways to implement the throttling. You could use the ActionBlock From Microsoft.Tpl.Dataflow. Or you could make your own using SemaphoreSlim:
private static readonly SemaphoreSlim Semaphore = new SemaphoreSlim(1);
private static async Task<WebResponse> GetThrottledWebResponse(WebRequest request)
{
await Semaphore.WaitAsync().ConfigureAwait(false);
try
{
return await request.GetResponseAsync().ConfigureAwait(false);
}
finally
{
Semaphore.Release();
}
}
I have made a class to handle multiple HTTP GET requests. It looks something like this:
public partial class MyHttpClass : IDisposable
{
private HttpClient theClient;
private string ApiBaseUrl = "https://example.com/";
public MyHttpClass()
{
this.theClient = new HttpClient();
this.theClient.BaseAddress = new Uri(ApiBaseUrl);
this.theClient.DefaultRequestHeaders.Accept.Clear();
this.theClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
}
public async Task<JObject> GetAsync(string reqUrl)
{
var returnObj = new JObject();
var response = await this.theClient.GetAsync(reqUrl);
if (response.IsSuccessStatusCode)
{
returnObj = await response.Content.ReadAsAsync<JObject>();
Console.WriteLine("GET successful");
}
else
{
Console.WriteLine("GET failed");
}
return returnObj;
}
public void Dispose()
{
theClient.Dispose();
}
}
I am then queueing multiple requets by using a loop over Task.Run() and then after Task.WaitAll() in the manner of:
public async Task Start()
{
foreach(var item in list)
{
taskList.Add(Task.Run(() => this.GetThing(item)));
}
Task.WaitAll(taskList.ToArray());
}
public async Task GetThing(string url)
{
var response = await this.theClient.GetAsync(url);
// some code to process and save response
}
It definitiely works faster than synchonus operation but it is not as fast as I expected. Based on other advice I think the local threadpool is slowing me down. MSDN suggest I should specify it as a long running task but I can't see a way to do that calling it like this.
Right now I haven't got into limiting threads, I am just doing batches and testing speed to discover the right approach.
Can anyone suggest some areas for me to look at to increase the speed?
So, after you've set your DefaultConnectionLimit to a nice high number, or just the ConnectionLimit of the ServicePoint that manages connections to the host you are hitting:
ServicePointManager
.FindServicePoint(new Uri("https://example.com/"))
.ConnectionLimit = 1000;
the only suspect bit of code is where you start everything...
public async Task Start()
{
foreach(var item in list)
{
taskList.Add(Task.Run(() => this.GetThing(item)));
}
Task.WaitAll(taskList.ToArray());
}
This can be reduced to
var tasks = list.Select(this.GetThing);
to create the tasks (your async methods return hot (running) tasks... no need to double wrap with Task.Run)
Then, rather that blocking while waiting for them to complete, wait asynchronously instead:
await Task.WhenAll(tasks);
You are probably hitting some overhead in creating multiple instance-based HttpClient vs using a static instance. Your implementation will not scale. Using a shared HttpClient is actually recommended.
See my answer why - What is the overhead of creating a new HttpClient per call in a WebAPI client?