I have made a class to handle multiple HTTP GET requests. It looks something like this:
public partial class MyHttpClass : IDisposable
{
private HttpClient theClient;
private string ApiBaseUrl = "https://example.com/";
public MyHttpClass()
{
this.theClient = new HttpClient();
this.theClient.BaseAddress = new Uri(ApiBaseUrl);
this.theClient.DefaultRequestHeaders.Accept.Clear();
this.theClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
}
public async Task<JObject> GetAsync(string reqUrl)
{
var returnObj = new JObject();
var response = await this.theClient.GetAsync(reqUrl);
if (response.IsSuccessStatusCode)
{
returnObj = await response.Content.ReadAsAsync<JObject>();
Console.WriteLine("GET successful");
}
else
{
Console.WriteLine("GET failed");
}
return returnObj;
}
public void Dispose()
{
theClient.Dispose();
}
}
I am then queueing multiple requets by using a loop over Task.Run() and then after Task.WaitAll() in the manner of:
public async Task Start()
{
foreach(var item in list)
{
taskList.Add(Task.Run(() => this.GetThing(item)));
}
Task.WaitAll(taskList.ToArray());
}
public async Task GetThing(string url)
{
var response = await this.theClient.GetAsync(url);
// some code to process and save response
}
It definitiely works faster than synchonus operation but it is not as fast as I expected. Based on other advice I think the local threadpool is slowing me down. MSDN suggest I should specify it as a long running task but I can't see a way to do that calling it like this.
Right now I haven't got into limiting threads, I am just doing batches and testing speed to discover the right approach.
Can anyone suggest some areas for me to look at to increase the speed?
So, after you've set your DefaultConnectionLimit to a nice high number, or just the ConnectionLimit of the ServicePoint that manages connections to the host you are hitting:
ServicePointManager
.FindServicePoint(new Uri("https://example.com/"))
.ConnectionLimit = 1000;
the only suspect bit of code is where you start everything...
public async Task Start()
{
foreach(var item in list)
{
taskList.Add(Task.Run(() => this.GetThing(item)));
}
Task.WaitAll(taskList.ToArray());
}
This can be reduced to
var tasks = list.Select(this.GetThing);
to create the tasks (your async methods return hot (running) tasks... no need to double wrap with Task.Run)
Then, rather that blocking while waiting for them to complete, wait asynchronously instead:
await Task.WhenAll(tasks);
You are probably hitting some overhead in creating multiple instance-based HttpClient vs using a static instance. Your implementation will not scale. Using a shared HttpClient is actually recommended.
See my answer why - What is the overhead of creating a new HttpClient per call in a WebAPI client?
Related
I have a list of URLs (thousands), I want to asynchronously get page data from each URL as fast as possible without putting extreme load on the CPU.
I have tried using threading but it still feels quite slow:
public static ConcurrentQueue<string> List = new ConcurrentQueue<string>(); //URL List (assume I added them already)
public static void Threading()
{
for(int i=0;i<100;i++) //100 threads
{
Thread thread = new Thread(new ThreadStart(Task));
thread.Start();
}
}
public static void Task()
{
while(!(List.isEmpty))
{
List.TryDequeue(out string URL);
//GET REQUEST HERE
}
}
Is there any better way to do this? I want to do this asynchronously but I can't figure out how to do it, and I don't want to sacrifice speed or CPU efficiency to do so.
Thanks :)
You should use Microsoft's Reactive Framework (aka Rx) - NuGet System.Reactive and add using System.Reactive.Linq; - then you can do this:
public static IObservable<(string url, string content)> GetAllUrls(List<string> urls) =>
Observable
.Using(
() => new HttpClient(),
hc =>
from url in urls.ToObservable()
from response in Observable.FromAsync(() => hc.GetAsync(url))
from content in Observable.FromAsync(() => response.Content.ReadAsStringAsync())
select (url, content));
That allows you to consume the results in a couple of ways.
You can process them as they get produced:
IDisposable subscription =
GetAllUrls(urlsx).Subscribe(x => Console.WriteLine(x.content));
Or you can get all of them produced and then await the full results:
(string url, string content)[] results = await GetAllUrls(urlsx).ToArray();
You are best off using HttpClient which allows async Task requests.
Just store each task in a list, and await the whole list. To prevent too many requests at once, wait for any single one to complete if there are too many, and remove the completed one from the list.
const int maxDegreeOfParallelism = 100;
static HttpClient _client = new HttpClient();
public static async Task GetAllUrls(List<string> urls)
{
var tasks = new List<Task>(urls.Count);
foreach (var url in urls)
{
if (tasks.Count == maxDegreeOfParallelism) // this prevents too many requests at once
tasks.Remove(await Task.WhenAny(tasks));
tasks.Add(GetUrl(url));
}
await Task.WhenAll(tasks);
}
private static async Task GetUrl(string url)
{
using var response = await _client.GetAsync(url);
// handle response here
var responseStr = await response.Content.ReadAsStringAsync(); // whatever
// do stuff etc
}
I have a problem with C# async and await when multiple requests hit my controller at the same time.
I need to requests execute one by one not all at the same time.
When the first request save data in DB then from queue get the next request
I think it's enough to put some synchronziation there. Let's assume following code:
public async Task GetData()
{
var result = await db.GetDataFromDatabase();
return result;
}
You can declare private field of type SemaphoreSlim for example:
private readonly SemaphoreSlim _getDataSem = new SemaphoreSlim(1, 1);
and then modify body of the method:
public async Task GetData()
{
_getDataSem.Wait();
try
{
var result = await db.GetDataFromDatabase();
return result;
}
finally { _getDataSem.Release(); }
}
There is wide range of ways to synchronize such code, like Semaphore, locks, etc.
I have few methods that report some data to Data base. We want to invoke all calls to Data service asynchronously. These calls to data service are all over and so we want to make sure that these DS calls are executed one after another in order at any given time. Initially, i was using async await on each of these methods and each of the calls were executed asynchronously but we found out if they are out of sequence then there are room for errors.
So, i thought we should queue all these asynchronous tasks and send them in a separate thread but i want to know what options we have? I came across 'SemaphoreSlim' . Will this be appropriate in my use case?
Or what other options will suit my use case? Please, guide me.
So, what i have in my code currently
public static SemaphoreSlim mutex = new SemaphoreSlim(1);
//first DS call
public async Task SendModuleDataToDSAsync(Module parameters)
{
var tasks1 = new List<Task>();
var tasks2 = new List<Task>();
//await mutex.WaitAsync(); **//is this correct way to use SemaphoreSlim ?**
foreach (var setting in Module.param)
{
Task job1 = SaveModule(setting);
tasks1.Add(job1);
Task job2= SaveModule(GetAdvancedData(setting));
tasks2.Add(job2);
}
await Task.WhenAll(tasks1);
await Task.WhenAll(tasks2);
//mutex.Release(); // **is this correct?**
}
private async Task SaveModule(Module setting)
{
await Task.Run(() =>
{
// Invokes Calls to DS
...
});
}
//somewhere down the main thread, invoking second call to DS
//Second DS Call
private async Task SendInstrumentSettingsToDS(<param1>, <param2>)
{
//await mutex.WaitAsync();// **is this correct?**
await Task.Run(() =>
{
//TrackInstrumentInfoToDS
//mutex.Release();// **is this correct?**
});
if(param2)
{
await Task.Run(() =>
{
//TrackParam2InstrumentInfoToDS
});
}
}
Initially, i was using async await on each of these methods and each of the calls were executed asynchronously but we found out if they are out of sequence then there are room for errors.
So, i thought we should queue all these asynchronous tasks and send them in a separate thread but i want to know what options we have? I came across 'SemaphoreSlim' .
SemaphoreSlim does restrict asynchronous code to running one at a time, and is a valid form of mutual exclusion. However, since "out of sequence" calls can cause errors, then SemaphoreSlim is not an appropriate solution since it does not guarantee FIFO.
In a more general sense, no synchronization primitive guarantees FIFO because that can cause problems due to side effects like lock convoys. On the other hand, it is natural for data structures to be strictly FIFO.
So, you'll need to use your own FIFO queue, rather than having an implicit execution queue. Channels is a nice, performant, async-compatible queue, but since you're on an older version of C#/.NET, BlockingCollection<T> would work:
public sealed class ExecutionQueue
{
private readonly BlockingCollection<Func<Task>> _queue = new BlockingCollection<Func<Task>>();
public ExecutionQueue() => Completion = Task.Run(() => ProcessQueueAsync());
public Task Completion { get; }
public void Complete() => _queue.CompleteAdding();
private async Task ProcessQueueAsync()
{
foreach (var value in _queue.GetConsumingEnumerable())
await value();
}
}
The only tricky part with this setup is how to queue work. From the perspective of the code queueing the work, they want to know when the lambda is executed, not when the lambda is queued. From the perspective of the queue method (which I'm calling Run), the method needs to complete its returned task only after the lambda is executed. So, you can write the queue method something like this:
public Task Run(Func<Task> lambda)
{
var tcs = new TaskCompletionSource<object>();
_queue.Add(async () =>
{
// Execute the lambda and propagate the results to the Task returned from Run
try
{
await lambda();
tcs.TrySetResult(null);
}
catch (OperationCanceledException ex)
{
tcs.TrySetCanceled(ex.CancellationToken);
}
catch (Exception ex)
{
tcs.TrySetException(ex);
}
});
return tcs.Task;
}
This queueing method isn't as perfect as it could be. If a task completes with more than one exception (this is normal for parallel code), only the first one is retained (this is normal for async code). There's also an edge case around OperationCanceledException handling. But this code is good enough for most cases.
Now you can use it like this:
public static ExecutionQueue _queue = new ExecutionQueue();
public async Task SendModuleDataToDSAsync(Module parameters)
{
var tasks1 = new List<Task>();
var tasks2 = new List<Task>();
foreach (var setting in Module.param)
{
Task job1 = _queue.Run(() => SaveModule(setting));
tasks1.Add(job1);
Task job2 = _queue.Run(() => SaveModule(GetAdvancedData(setting)));
tasks2.Add(job2);
}
await Task.WhenAll(tasks1);
await Task.WhenAll(tasks2);
}
Here's a compact solution that has the least amount of moving parts but still guarantees FIFO ordering (unlike some of the suggested SemaphoreSlim solutions). There are two overloads for Enqueue so you can enqueue tasks with and without return values.
using System;
using System.Threading;
using System.Threading.Tasks;
public class TaskQueue
{
private Task _previousTask = Task.CompletedTask;
public Task Enqueue(Func<Task> asyncAction)
{
return Enqueue(async () => {
await asyncAction().ConfigureAwait(false);
return true;
});
}
public async Task<T> Enqueue<T>(Func<Task<T>> asyncFunction)
{
var tcs = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously);
// get predecessor and wait until it's done. Also atomically swap in our own completion task.
await Interlocked.Exchange(ref _previousTask, tcs.Task).ConfigureAwait(false);
try
{
return await asyncFunction().ConfigureAwait(false);
}
finally
{
tcs.SetResult();
}
}
}
Please keep in mind that your first solution queueing all tasks to lists doesn't ensure that the tasks are executed one after another. They're all running in parallel because they're not awaited until the next tasks is startet.
So yes you've to use a SemapohoreSlim to use async locking and await. A simple implementation might be:
private readonly SemaphoreSlim _syncRoot = new SemaphoreSlim(1);
public async Task SendModuleDataToDSAsync(Module parameters)
{
await this._syncRoot.WaitAsync();
try
{
foreach (var setting in Module.param)
{
await SaveModule(setting);
await SaveModule(GetAdvancedData(setting));
}
}
finally
{
this._syncRoot.Release();
}
}
If you can use Nito.AsyncEx the code can be simplified to:
public async Task SendModuleDataToDSAsync(Module parameters)
{
using var lockHandle = await this._syncRoot.LockAsync();
foreach (var setting in Module.param)
{
await SaveModule(setting);
await SaveModule(GetAdvancedData(setting));
}
}
One option is to queue operations that will create tasks instead of queuing already running tasks as the code in the question does.
PseudoCode without locking:
Queue<Func<Task>> tasksQueue = new Queue<Func<Task>>();
async Task RunAllTasks()
{
while (tasksQueue.Count > 0)
{
var taskCreator = tasksQueue.Dequeu(); // get creator
var task = taskCreator(); // staring one task at a time here
await task; // wait till task completes
}
}
// note that declaring createSaveModuleTask does not
// start SaveModule task - it will only happen after this func is invoked
// inside RunAllTasks
Func<Task> createSaveModuleTask = () => SaveModule(setting);
tasksQueue.Add(createSaveModuleTask);
tasksQueue.Add(() => SaveModule(GetAdvancedData(setting)));
// no DB operations started at this point
// this will start tasks from the queue one by one.
await RunAllTasks();
Using ConcurrentQueue would be likely be right thing in actual code. You also would need to know total number of expected operations to stop when all are started and awaited one after another.
Building on your comment under Alexeis answer, your approch with the SemaphoreSlim is correct.
Assumeing that the methods SendInstrumentSettingsToDS and SendModuleDataToDSAsync are members of the same class. You simplay need a instance variable for a SemaphoreSlim and then at the start of each methode that needs synchornization call await lock.WaitAsync() and call lock.Release() in the finally block.
public async Task SendModuleDataToDSAsync(Module parameters)
{
await lock.WaitAsync();
try
{
...
}
finally
{
lock.Release();
}
}
private async Task SendInstrumentSettingsToDS(<param1>, <param2>)
{
await lock.WaitAsync();
try
{
...
}
finally
{
lock.Release();
}
}
and it is importend that the call to lock.Release() is in the finally-block, so that if an exception is thrown somewhere in the code of the try-block the semaphore is released.
I need to call an API thousands of times as quickly as possible. The API has a limit of 10 calls per second. In order to take full advantage of the 10 calls per second without going over, I'm calling the API asynchronously and throttling the calls with a semaphore and a timer. My code enters the semaphore, calls the API, and then makes sure at least one second has passed before it releases the semaphore.
The API call is actually pretty quick and returns in about a second or less, so my code should move right on to the check time/release semaphore logic. However, what actually happens is after 10 calls the semaphore fills up and is not released at all until the rest of the async Tasks to call the API are created. After that everything works as expected, so I'm not experiencing any real issues. The behavior just seems strange and I would like to understand it.
public static class MyClass
{
SemaphoreSlim semaphore = new SemaphoreSlim(10);
public static async Task CreateCallApiTasks(IList<object> requests)
{
var callApiTasks = requests.Select(x => CallApi(x));
await Task.WhenAll(callApiTasks);
}
private static async Task CallApi(object requestBody)
{
using (var request = new HttpRequestMessage(HttpMethod.Post, <apiUri>))
{
request.Content = new StringContent(requestBody, System.Text.Encoding.UTF8, "application/json");
HttpResponseMessage response = null;
using (var httpClient = new HttpClient())
{
var throttle = new Stopwatch();
ExceptionDispatchInfo capturedException = null;
await semaphore.WaitAsync();
try
{
throttle.Start();
response = await httpClient.SendAsync(request);
while (throttle.ElapsedMilliseconds < 1000)
{
await Task.Delay(1);
}
semaphore.Release();
throttle.Stop();
}
catch (Exception ex)
{
capturedException = ExceptionDispatchInfo.Capture(ex);
}
if (capturedException != null)
{
while (throttle.ElapsedMilliseconds < 1000)
{
await Task.Delay(1);
}
semaphore.Release();
throttle.Stop();
capturedException.Throw();
}
}
}
}
}
I'm trying to figure out if using aysnc/await will help application throughput when using HttpClient to POST to an external api.
Scenario: I have a class that POST's data to a payment processors web api. There are 4 steps to POST a payment:
1 - POST Contact
2 - POST Transaction
3 - POST Donation
4 - POST Credit Card Payment
Steps 1 - 4 must be sequential in order specified above.
My application does not have any "busy work" to do when waiting for a response from the payment processor - in this scenario does using async/await for the operations below make sense? Will it increase application throughput during high volume? Thanks!
Edit: (question was marked as not clear)
1. My application is a web api (microservice)
2. I'm using .Result (blocking) to avoid async/await (clearly this is wrong!)
3. We will have "spike" loads of 1000 req/minute
public virtual ConstituentResponse PostConstituent(Constituent constituent)
{
var response = PostToUrl<Constituent>("/api/Constituents", constituent);
if (!response.IsSuccessStatusCode)
HandleError(response);
return response.Content.ReadAsAsync<ConstituentResponse>().Result;
}
public virtual TransactionResponse PostTransaction(Transaction transaction)
{
var response = PostToUrl<Transaction>("/api/Transactions", transaction);
if (!response.IsSuccessStatusCode)
HandleError(response);
return response.Content.ReadAsAsync<TransactionResponse>().Result;
}
public virtual DonationResponse PostDonation(Donation donation)
{
var response = PostToUrl<Donation>("/api/Donations", donation);
if (!response.IsSuccessStatusCode)
HandleError(response);
return response.Content.ReadAsAsync<DonationResponse>().Result;
}
public virtual CreditCardPaymentResponse PostCreditCardPayment(CreditCardPayment creditCardPayment)
{
var response = PostToUrl<CreditCardPayment>("/api/CreditCardPayments", creditCardPayment);
if (!response.IsSuccessStatusCode)
HandleError(response);
return response.Content.ReadAsAsync<CreditCardPaymentResponse>().Result;
}
protected virtual HttpResponseMessage PostToUrl<T>(string methodUri, T value)
{
return _httpClient.PostAsJsonAsync(methodUri, value).Result;
}
The five methods above are called from another class/function:
public virtual IPaymentResult Purchase(IDonationEntity donation, ICreditCard creditCard)
{
var constituentResponse = PostConstituent(donation);
var transactionResponse = PostTransaction(donation, constituentResponse);
var donationResponse = PostDonation(donation, constituentResponse, transactionResponse);
var creditCardPaymentResponse = PostCreditCardPayment(donation, creditCard, transactionResponse);
var paymentResult = new PaymentResult
{
Success = (creditCardPaymentResponse.Status == Constants.PaymentResult.Succeeded),
ExternalPaymentID = creditCardPaymentResponse.PaymentID,
ErrorMessage = creditCardPaymentResponse.ErrorMessage
};
return paymentResult;
}
You cannot actually utilize await Task.WhenAll here as when you are purchasing the next asynchronous operation relies on the result from the previous. As such you need to have them execute in the serialized manner. However, it is still highly recommended that you use async / await for I/O such as this, i.e.; web service calls.
The code is written with the consumption of Async* method calls, but instead of actually using the pattern -- it is blocking and could be a potential for deadlocks as well as undesired performance implications. you should only ever use .Result (and .Wait()) in console applications. Ideally, you should be using async / await. Here is the proper way to adjust the code.
public virtual async Task<ConstituentResponse> PostConstituenAsync(Constituent constituent)
{
var response = await PostToUrlAsync<Constituent>("/api/Constituents", constituent);
if (!response.IsSuccessStatusCode)
HandleError(response);
return await response.Content.ReadAsAsync<ConstituentResponse>();
}
public virtual async Task<TransactionResponse PostTransactionAsync(Transaction transaction)
{
var response = await PostToUrl<Transaction>("/api/Transactions", transaction);
if (!response.IsSuccessStatusCode)
HandleError(response);
return await response.Content.ReadAsAsync<TransactionResponse>();
}
public virtual async Task<DonationResponse> PostDonationAsync(Donation donation)
{
var response = await PostToUrl<Donation>("/api/Donations", donation);
if (!response.IsSuccessStatusCode)
HandleError(response);
return await response.Content.ReadAsAsync<DonationResponse>();
}
public virtual async Task<CreditCardPaymentResponse> PostCreditCardPaymentAsync(CreditCardPayment creditCardPayment)
{
var response = await PostToUrlAsync<CreditCardPayment>("/api/CreditCardPayments", creditCardPayment);
if (!response.IsSuccessStatusCode)
HandleError(response);
return await response.Content.ReadAsAsync<CreditCardPaymentResponse>();
}
protected virtual Task<HttpResponseMessage> PostToUrlAsync<T>(string methodUri, T value)
{
return _httpClient.PostAsJsonAsync(methodUri, value);
}
Usage
public virtual await Task<IPaymentResult> PurchaseAsync(IDonationEntity donation, ICreditCard creditCard)
{
var constituentResponse = await PostConstituentAsync(donation);
var transactionResponse = await PostTransactionAsync(donation, constituentResponse);
var donationResponse = await PostDonationAsync(donation, constituentResponse, transactionResponse);
var creditCardPaymentResponse = await PostCreditCardPaymentAsync(donation, creditCard, transactionResponse);
var paymentResult = new PaymentResult
{
Success = (creditCardPaymentResponse.Status == Constants.PaymentResult.Succeeded),
ExternalPaymentID = creditCardPaymentResponse.PaymentID,
ErrorMessage = creditCardPaymentResponse.ErrorMessage
};
return paymentResult;
}
First of all the way the code is written now does not help at all because you are blocking all the time by calling Result. If this was a good thing to do, why wouldn't all APIs simply do this internally for you?! You can't cheat with async...
You will only see throughput gains if you exceed the capabilities of the thread pool which happens in the 100s of threads range.
he average number of threads needed is requestsPerSecond * requestDurationInSeconds. Plug in some numbers to see whether this is realistic for you.
I'll link you my standard posts on whether to go sync or async because I feel you don't have absolute clarity for when async IO is appropriate.
https://stackoverflow.com/a/25087273/122718 Why does the EF 6 tutorial use asychronous calls?
https://stackoverflow.com/a/12796711/122718 Should we switch to use async I/O by default?
Generally, it is appropriate when the wait times are long and there are many parallel requests running.
My application does not have any "busy work" to do when waiting for a response
The other requests coming in are such busy work.
Note, that when a thread is blocked the CPU is not blocked as well. Another thread can run.
When you are doing async/await, you should async all the day.
Read Async/Await - Best Practices in Asynchronous Programming
You need to make them return async
public virtual async Task ConstituentResponse PostConstituent(Constituent constituent)
{
var response = PostToUrl<Constituent>("/api/Constituents", constituent);
if (!response.IsSuccessStatusCode)
HandleError(response);
return await response.Content.ReadAsAsync<ConstituentResponse>();
}
//...
//etc
And then from the main function
await Task.WhenAll(constituentResponse, transactionResponse, donationResponse, creditCardPaymentResponse);
Edit: Misread OP. Don't use await Task.WhenAll for synchronous calls