So I did this project in uni that I am trying to refactor. One of the problems I am having is my method for getting the top list which consist of around 250 movies, e.g. 250 API calls. After that I render them all on my web page. The API I am using is OMDBAPI and I am getting every movie individually as you can see in the code below.
Basically that the web page does is as default loads 10 movies but I can also load in all movies which is around 250.
I am trying to wrap my head around asynchronous programming. So basically it is taking around 4-6 seconds to process this method according to stopwatch in C# but I believe it should be possible to refactor and refine. I am new to asynchronous programming and I have tried looking at MSFT documentation and several issues before here on SO, but I am not getting anywhere with speeding up the calls.
I have looked at using parallel for the issue but I think my problem should be solvable with async?
With stopwatch in C# I have pinpointed the delay to come mostly from between the two x.
I would foremost like to speed up the calls but I would love tips on best practice with async programming as well.
public async Task<List<HomeTopListMovieDTO>> GetTopListAggregatedData(Parameter parameter)
{
List<Task<HomeTopListMovieDTO>> tasks = new List<Task<HomeTopListMovieDTO>>();
var toplist = await GetToplist(parameter);
//x
foreach (var movie in toplist)
{
tasks.Add(GetTopListMovieDetails(movie.ImdbID));
}
var results = Task.WhenAll(tasks);
//x
var tempToplist = toplist.ToArray();
for (int i = 0; i < tasks.Count; i++)
{
tasks[i].Result.NumberOfLikes = tempToplist[i].NumberOfLikes;
tasks[i].Result.NumberOfDislikes = tempToplist[i].NumberOfDislikes;
}
List<HomeTopListMovieDTO> toplistMovies = results.Result.ToList();
return toplistMovies;
}
public async Task<HomeTopListMovieDTO> GetTopListMovieDetails(string imdbId)
{
string urlString = baseUrl + "i=" + imdbId + accessKey;
return await apiWebClient.GetAsync<HomeTopListMovieDTO>(urlString);
}
public async Task<T> GetAsync<T>(string urlString)
{
using (HttpClient client = new HttpClient())
{
var response = await client.GetAsync(urlString,
HttpCompletionOption.ResponseHeadersRead);
response.EnsureSuccessStatusCode();
var data = await response.Content.ReadAsStringAsync();
var result = JsonConvert.DeserializeObject<T>(data);
return result;
}
}
You async code looks OKey. I would throttle it to not make more than X parallel requests using Partitioner / Parallel for each instead but approach with WaitAll is also good enough unless you see connection refused because of port exhaustion or API DDOS protection.
You should reuse HttpClient, see more details in
https://www.aspnetmonsters.com/2016/08/2016-08-27-httpclientwrong, so in your case create HttpClient in the root method and pass it as a parameter to your async methods. HttpClient is thread safe, can be used in parallel calls.
You should dispose HttpResponse.
Related
The following code gets a list of investments belonging to a customer from 3 different resources. The flow starts with a controller's call and follows the flow described below where all methods are declared as async and called with await operator.
I'm wondering if is there a problem making all methods as async. Is there any performance penalty? Is it a code smell or an anti-pattern?
I know there are things that must be waited like access url, get data from cahce, etc. But I think there are things like filling a list or sum some few values doesn't need to be async.
Below follow the code (some parts where ommited for clearness):
Controller
{HttpGet]
public async Task<IActionResult> Get()
{
Client client = await _mediator.Send(new RecuperarInvestimentosQuery());
return Ok(cliente);
}
QueryHandler
public async Task<Client> Handle(RecoverInvestimentsQuery request, CancellationToken cancellationToken)
{
Client client;
List<Investiment> list = await _investimentBuilder.GetInvestiments();
client = new Cliente(request.Id, list);
return client;
}
InvestmentBuilder
public async Task<List<Investiment>> GetInvestiments()
{
ListInvestiments builder = new ListInvestiments();
await builder.BuildLists(_builder);
// here I get the List<Investiment> list already fulfilled to return to the controller
return list;
}
BuildLists
public async Task BuildLists(IBuilder builder)
{
Task[] tasks = new Task[] {
builder.GetFundsAsync(), //****
builder.ObterTesouro(),
builder.ObterRendaFixa()
};
await Task.WhenAll(tasks);
}
Funds, Bonds and Fixed Income Services (***all 3 methods are equal, only its name vary, so I just put one of them for the sake of saving space)
public async Task GetFundsAsync()
{
var listOfFunds = await _FundsService.RecoverFundsAsync();
// listOfFunds will get all items from all types of investments
}
Recover Funds, Bonds and Fixed Incomes methods are equals too, again I just put one of them
public async Task<List<Funds>> RecoverFundsAsync()
{
var returnCache = await _clientCache.GetValueAsync("fundsService");
// if not in cache, so go get from url
if (returnCache == null)
{
string url = _configuration.GetValue<string>("Urls:Funds");
var response = await _clienteHttp.ObterDadosAsync(url);
if (response != null)
{
string funds = JObject.Parse(response).SelectToken("funds").ToString();
await _clienteCache.SetValueAsync("fundService", funds);
return JsonConvert.DeserializeObject<List<Funds>>(fundos);
}
else
return null;
}
return JsonConvert.DeserializeObject<List<Funds>>(returnCache);
}
HTTP Client
public async Task<string> GetDataAsync(string Url)
{
using (HttpClient client = _clientFactory.CreateClient())
{
var response = await client.GetAsync(Url);
if (response.IsSuccessStatusCode)
return await response.Content.ReadAsStringAsync();
else
return null;
}
}
Cache Client
public async Task<string> GetValueAsync(string key)
{
IDatabase cache = Connection.GetDatabase();
RedisValue value = await cache.StringGetAsync(key);
if (value.HasValue)
return value.ToString();
else
return null;
}
Could someone give a thought about that?
Thanks in advance.
Your code looks okay for me. You are using async and await just for I/O and web access operations, and it perfectly fits for async and await purposes:
For I/O-bound code, you await an operation that returns a Task or Task inside of an async method.
For CPU-bound code, you await an operation that is started on a background thread with the Task.Run method.
Once you've used async and await, then all pieces of your code tends to become asynchronous too. This fact is described greatly in the MSDN article - Async/Await - Best Practices in Asynchronous Programming:
Asynchronous code reminds me of the story of a fellow who mentioned
that the world was suspended in space and was immediately challenged
by an elderly lady claiming that the world rested on the back of a
giant turtle. When the man enquired what the turtle was standing on,
the lady replied, “You’re very clever, young man, but it’s turtles all
the way down!” As you convert synchronous code to asynchronous code,
you’ll find that it works best if asynchronous code calls and is
called by other asynchronous code—all the way down (or “up,” if you
prefer). Others have also noticed the spreading behavior of
asynchronous programming and have called it “contagious” or compared
it to a zombie virus. Whether turtles or zombies, it’s definitely true
that asynchronous code tends to drive surrounding code to also be
asynchronous. This behavior is inherent in all types of asynchronous
programming, not just the new async/await keywords.
I have a WebApi in .NET CORE 3.1 in which I'm trying to get results from a service (other third party). I have created multiple requests in my API for the same service but some parameters of every request are different, the results return from service will be different for every request but structure of result will be same.
As all requests are independent of each other I want to run all that in parallel. And I want to return the first result as soon as received from the service from my API, but I also want to run all other requests in background and save there results in REDIS.
I tried to create a sample code to check if possible:
[HttpPost]
[Route("Test")]
public async Task<SearchResponse> Test(SearchRequest req)
{
List<Task<SearchResponse>> TaskList = new List<Task<SearchResponse>>();
for (int i = 0; i < 10; i++)
{
SearchRequest copyReq = Util.Copy(req); // my util function to copy the request
copyReq.ChangedParameter = i; // This is an example, many param can changed
TaskList.Add(Task.Run(() => DoSomething(copyReq)));
}
var finishedTask = await Task.WhenAny(TaskList);
return await finishedTask;
}
private async Task<SearchResponse> DoSomething(SearchRequest req)
{
// Here calling the third party service
SearchResponse resp = await service.GetResultAsync(req);
// Saving the result in REDIS
RedisManager.Save("KEY",resp);
return resp;
}
Now I'm wondering if this is correct way to dealing with this problem or not. If there is any better way please guide me to that.
EDIT
Use Case scenario
I have created a web app which will fetch results from my webapi and will display the results.
The WebApp searches for list of products (can be anything) by sending a request to my api. Now my api creates different requests as the source (Let's say Site1 and Site2) for results can be different.
Now the third party handles all requests to different sources(Site1 and Site2) and convert there results into my result structure. I have just to provide the parameter from which site i want to get results and then call the service at my end.
Now I want to send the results to my WebApp as soon as any source(site1 or site2) gives me the result, and in background I want to save the result of other source in redis. So that I can fetch that too from my webapp on other request hit.
The code looks pretty good; there's only one adjustment I'd recommend: don't use Task.Run. Task.Run causes a thread switch, which is totally unnecessary here.
[HttpPost]
[Route("Test")]
public async Task<SearchResponse> Test(SearchRequest req)
{
var TaskList = new List<Task<SearchResponse>>();
for (int i = 0; i < 10; i++)
{
SearchRequest copyReq = Util.Copy(req); // my util function to copy the request
copyReq.ChangedParameter = i; // This is an example, many param can changed
TaskList.Add(DoSomething(copyReq));
}
return await await Task.WhenAny(TaskList);
}
private async Task<SearchResponse> DoSomething(SearchRequest req)
{
// Here calling the third party service
SearchResponse resp = await service.GetResultAsync(req);
// Saving the result in REDIS
RedisManager.Save("KEY",resp);
return resp;
}
Note that this is using fire-and-forget. In the general sense, fire-and-forget is dangerous, since it means you don't care if the code fails or if it even completes. In this case, since the code is only updating a cache, fire-and-forget is acceptable.
I have an API which needs to be run in a loop for Mass processing.
Current single API is:
public async Task<ActionResult<CombinedAddressResponse>> GetCombinedAddress(AddressRequestDto request)
We are not allowed to touch/modify the original single API. However can be run in bulk, using foreach statement. What is the best way to run this asychronously without locks?
Current Solution below is just providing a list, would this be it?
public async Task<ActionResult<List<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
var combinedAddressResponses = new List<CombinedAddressResponse>();
foreach(AddressRequestDto request in requests)
{
var newCombinedAddress = (await GetCombinedAddress(request)).Value;
combinedAddressResponses.Add(newCombinedAddress);
}
return combinedAddressResponses;
}
Update:
In debugger, it has to go to combinedAddressResponse.Result.Value
combinedAddressResponse.Value = null
and Also strangely, writing combinedAddressResponse.Result.Value gives error below "Action Result does not contain a definition for for 'Value' and no accessible extension method
I'm writing this code off the top of my head without an IDE or sleep, so please comment if I'm missing something or there's a better way.
But effectively I think you want to run all your requests at once (not sequentially) doing something like this:
public async Task<ActionResult<List<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
var combinedAddressResponses = new List<CombinedAddressResponse>(requests.Count);
var tasks = new List<Task<ActionResult<CombinedAddressResponse>>(requests.Count);
foreach (var request in requests)
{
tasks.Add(Task.Run(async () => await GetCombinedAddress(request));
}
//This waits for all the tasks to complete
await tasks.WhenAll(tasks.ToArray());
combinedAddressResponses.AddRange(tasks.Select(x => x.Result.Value));
return combinedAddressResponses;
}
looking for a way to speed things up and run in parallel thanks
What you need is "asynchronous concurrency". I use the term "concurrency" to mean "doing more than one thing at a time", and "parallel" to mean "doing more than one thing at a time using threads". Since you're on ASP.NET, you don't want to use additional threads; you'd want to use a form of concurrency that works asynchronously (which uses fewer threads). So, Parallel and Task.Run should not be parts of your solution.
The way to do asynchronous concurrency is to build a collection of tasks, and then use await Task.WhenAll. E.g.:
public async Task<ActionResult<IReadOnlyList<CombinedAddressResponse>>> GetCombinedAddress(List<AddressRequestDto> requests)
{
// Build the collection of tasks by doing an asynchronous operation for each request.
var tasks = requests.Select(async request =>
{
var combinedAddressResponse = await GetCombinedAdress(request);
return combinedAddressResponse.Value;
}).ToList();
// Wait for all the tasks to complete and get the results.
var results = await Task.WhenAll(tasks);
return results;
}
I am using Parallel.Invoke to call a large array of Actions on a 4 core machine.
Each action makes a call to an external web api to retrieve a json package of info. That json package is then de-serialized into a series of objects. Each of those objects is then inserted into several tables via EntityFramework 6.
This will process around 2 thousand distinct IDs so I am trying to use the Parallel library to get as fast a through-put as possible.
My main:
private static void Main(string[] args)
{
var apiKey = "myKey";
List<string> caseIDs = new List<string>();
//read list of ids from DB
using (var db = new StagingContext())
{
caseIDs = db.BatchList.Where(b => b.CaseID!=null).Select(a => a.CaseID).Distinct().Take(5000).ToList();
}
List<Action> actions = new List<Action>();
foreach (var id in caseIDs)
{
var UniqueID = Guid.NewGuid();
actions.Add(() => GetRecords(id,"https://myAPIURL/{0}?api={1}&case={2}", apiKey, UniqueID));
}
ParallelOptions op = new ParallelOptions
{
CancellationToken = tok.Token,
MaxDegreeOfParallelism = 10
};
Parallel.Invoke(op, actions.ToArray());
Console.WriteLine("Done");
Console.ReadKey();
}
My action:
private static void GetRecords(string CaseID, string url, string apiKey, Guid UniqueID)
{
using (HttpClient client = new HttpClient())
{
var tmpUrl = string.Format(url, apiKey, CaseID);
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
var result = client.GetAsync(tmpUrl).Result;
var jsonString = result.Content.ReadAsStringAsync();
jsonString.Wait();
var myObjectList = new List<MyObject>();
if (!jsonString.Result.Contains("error"))
{
myObjectList.AddRange(JsonConvert.DeserializeObject<List<MyObject>>(jsonString.Result));
foreach (var item in myObjectList)
{
item.UniqueID = UniqueID;
}
}
//Write this out to DB
using (var db = new StagingContext())
{
var myMappedObjectList = myObjectList.Adapt<List<MyObject>>();
db.CaseAttributeHistories.AddRange(myMappedObjectList);
using (var scope = new TransactionScope(TransactionScopeOption.Required, new TransactionOptions { IsolationLevel = IsolationLevel.ReadUncommitted }))
{
db.SaveChanges();
scope.Complete();
}
}
}
}
When I process a smaller set of data, ~1000 records, it works pretty good. When I process a larger data set , >1400, I often get an
“A task was canceled.”
error.
I am new to the Parallel & multi-threading.
Is this a valid approach?
Is there a good way to track down what is
causing the cancellation?
How would I handle/ignore the error and
continue with the rest of the records?
Is there a better or faster pattern to use in this situation?
First, check for Exceptions. Swallowing a Exception is a deadly sin of exception handling. And unfortunately Multithreading does that fully automatically. Normally you have to write code for that. In mutltithreading you have to write code to avoid it. I would advise those two articles on Exception handling before you try your hand at Multithreading:
http://blogs.msdn.com/b/ericlippert/archive/2008/09/10/vexing-exceptions.aspx
http://www.codeproject.com/Articles/9538/Exception-Handling-Best-Practices-in-NET
Secondly, doing sequential calls to a Web API is generally a bad idea. Please verify that you do not have a way to retrieve the data in bulk, rather then piecemeal. Piecemeal retreival often incurs more overhead then data.
Third, are you even allowed to automate it on that scale? If the APi provider wants no bulk retreival, he might not want automation on that scale. If so he might notice the sudden increase in load and apply some load-throteling later. That could kill your programm.
Fourth, Multithreading a APi call will propably not speed things up. The WEB API and Network will be the bottleneck with a very high propability. Multithreading only helps with CPU bottlenecked operations. With Network, Disk, DB and similar operations, there will be often 0 performance incraese. Or even a performance decrease, as the multiple operations get in each others way.
A bit of Multitasking (even just a single alternate Thread) is mandatory with Network, Disk and similar longrunning opeations. But actuall Multithreading rarely to never helps.
I bet the exception is being thrown from client.GetAsync?
HttpClient will throw TaskCanceledException when the HTTP call times out. (i.e. the web service is not responding)
Annoying, I know.
It's possible that, because you're hitting it so hard, it can't keep up. You can try raising the Timeout property of your HttpClient, but the default is already 100 seconds.
If you want to just ignore those errors, then wrap the client.GetAsync(tmpUrl) in a try/catch block and just return (and maybe log it somewhere).
I'm trying to figure out if using aysnc/await will help application throughput when using HttpClient to POST to an external api.
Scenario: I have a class that POST's data to a payment processors web api. There are 4 steps to POST a payment:
1 - POST Contact
2 - POST Transaction
3 - POST Donation
4 - POST Credit Card Payment
Steps 1 - 4 must be sequential in order specified above.
My application does not have any "busy work" to do when waiting for a response from the payment processor - in this scenario does using async/await for the operations below make sense? Will it increase application throughput during high volume? Thanks!
Edit: (question was marked as not clear)
1. My application is a web api (microservice)
2. I'm using .Result (blocking) to avoid async/await (clearly this is wrong!)
3. We will have "spike" loads of 1000 req/minute
public virtual ConstituentResponse PostConstituent(Constituent constituent)
{
var response = PostToUrl<Constituent>("/api/Constituents", constituent);
if (!response.IsSuccessStatusCode)
HandleError(response);
return response.Content.ReadAsAsync<ConstituentResponse>().Result;
}
public virtual TransactionResponse PostTransaction(Transaction transaction)
{
var response = PostToUrl<Transaction>("/api/Transactions", transaction);
if (!response.IsSuccessStatusCode)
HandleError(response);
return response.Content.ReadAsAsync<TransactionResponse>().Result;
}
public virtual DonationResponse PostDonation(Donation donation)
{
var response = PostToUrl<Donation>("/api/Donations", donation);
if (!response.IsSuccessStatusCode)
HandleError(response);
return response.Content.ReadAsAsync<DonationResponse>().Result;
}
public virtual CreditCardPaymentResponse PostCreditCardPayment(CreditCardPayment creditCardPayment)
{
var response = PostToUrl<CreditCardPayment>("/api/CreditCardPayments", creditCardPayment);
if (!response.IsSuccessStatusCode)
HandleError(response);
return response.Content.ReadAsAsync<CreditCardPaymentResponse>().Result;
}
protected virtual HttpResponseMessage PostToUrl<T>(string methodUri, T value)
{
return _httpClient.PostAsJsonAsync(methodUri, value).Result;
}
The five methods above are called from another class/function:
public virtual IPaymentResult Purchase(IDonationEntity donation, ICreditCard creditCard)
{
var constituentResponse = PostConstituent(donation);
var transactionResponse = PostTransaction(donation, constituentResponse);
var donationResponse = PostDonation(donation, constituentResponse, transactionResponse);
var creditCardPaymentResponse = PostCreditCardPayment(donation, creditCard, transactionResponse);
var paymentResult = new PaymentResult
{
Success = (creditCardPaymentResponse.Status == Constants.PaymentResult.Succeeded),
ExternalPaymentID = creditCardPaymentResponse.PaymentID,
ErrorMessage = creditCardPaymentResponse.ErrorMessage
};
return paymentResult;
}
You cannot actually utilize await Task.WhenAll here as when you are purchasing the next asynchronous operation relies on the result from the previous. As such you need to have them execute in the serialized manner. However, it is still highly recommended that you use async / await for I/O such as this, i.e.; web service calls.
The code is written with the consumption of Async* method calls, but instead of actually using the pattern -- it is blocking and could be a potential for deadlocks as well as undesired performance implications. you should only ever use .Result (and .Wait()) in console applications. Ideally, you should be using async / await. Here is the proper way to adjust the code.
public virtual async Task<ConstituentResponse> PostConstituenAsync(Constituent constituent)
{
var response = await PostToUrlAsync<Constituent>("/api/Constituents", constituent);
if (!response.IsSuccessStatusCode)
HandleError(response);
return await response.Content.ReadAsAsync<ConstituentResponse>();
}
public virtual async Task<TransactionResponse PostTransactionAsync(Transaction transaction)
{
var response = await PostToUrl<Transaction>("/api/Transactions", transaction);
if (!response.IsSuccessStatusCode)
HandleError(response);
return await response.Content.ReadAsAsync<TransactionResponse>();
}
public virtual async Task<DonationResponse> PostDonationAsync(Donation donation)
{
var response = await PostToUrl<Donation>("/api/Donations", donation);
if (!response.IsSuccessStatusCode)
HandleError(response);
return await response.Content.ReadAsAsync<DonationResponse>();
}
public virtual async Task<CreditCardPaymentResponse> PostCreditCardPaymentAsync(CreditCardPayment creditCardPayment)
{
var response = await PostToUrlAsync<CreditCardPayment>("/api/CreditCardPayments", creditCardPayment);
if (!response.IsSuccessStatusCode)
HandleError(response);
return await response.Content.ReadAsAsync<CreditCardPaymentResponse>();
}
protected virtual Task<HttpResponseMessage> PostToUrlAsync<T>(string methodUri, T value)
{
return _httpClient.PostAsJsonAsync(methodUri, value);
}
Usage
public virtual await Task<IPaymentResult> PurchaseAsync(IDonationEntity donation, ICreditCard creditCard)
{
var constituentResponse = await PostConstituentAsync(donation);
var transactionResponse = await PostTransactionAsync(donation, constituentResponse);
var donationResponse = await PostDonationAsync(donation, constituentResponse, transactionResponse);
var creditCardPaymentResponse = await PostCreditCardPaymentAsync(donation, creditCard, transactionResponse);
var paymentResult = new PaymentResult
{
Success = (creditCardPaymentResponse.Status == Constants.PaymentResult.Succeeded),
ExternalPaymentID = creditCardPaymentResponse.PaymentID,
ErrorMessage = creditCardPaymentResponse.ErrorMessage
};
return paymentResult;
}
First of all the way the code is written now does not help at all because you are blocking all the time by calling Result. If this was a good thing to do, why wouldn't all APIs simply do this internally for you?! You can't cheat with async...
You will only see throughput gains if you exceed the capabilities of the thread pool which happens in the 100s of threads range.
he average number of threads needed is requestsPerSecond * requestDurationInSeconds. Plug in some numbers to see whether this is realistic for you.
I'll link you my standard posts on whether to go sync or async because I feel you don't have absolute clarity for when async IO is appropriate.
https://stackoverflow.com/a/25087273/122718 Why does the EF 6 tutorial use asychronous calls?
https://stackoverflow.com/a/12796711/122718 Should we switch to use async I/O by default?
Generally, it is appropriate when the wait times are long and there are many parallel requests running.
My application does not have any "busy work" to do when waiting for a response
The other requests coming in are such busy work.
Note, that when a thread is blocked the CPU is not blocked as well. Another thread can run.
When you are doing async/await, you should async all the day.
Read Async/Await - Best Practices in Asynchronous Programming
You need to make them return async
public virtual async Task ConstituentResponse PostConstituent(Constituent constituent)
{
var response = PostToUrl<Constituent>("/api/Constituents", constituent);
if (!response.IsSuccessStatusCode)
HandleError(response);
return await response.Content.ReadAsAsync<ConstituentResponse>();
}
//...
//etc
And then from the main function
await Task.WhenAll(constituentResponse, transactionResponse, donationResponse, creditCardPaymentResponse);
Edit: Misread OP. Don't use await Task.WhenAll for synchronous calls