azure functions running in sequence, parallel desired - c#

I have an azure function that I'm calling in parallel using postasync...
I arrange all my tasks in a queue and then wait for the responses in parallel using "WhenAll".
I can confirm that there is a burst of HTTP activity out to Azure and then HTTP activity stops on my local machine while I wait for responses from Azure.
When I monitor the function in Azure Portal, it looks like the requests are arriving every three seconds or so, even though from my side there is no network traffic after the initial burst.
When I get my results back, they are arriving in sequence, in the exact same order I sent them out, even though the Azure Portal monitor indicates that some functions take 10 seconds to run and some take 3 seconds to run.
I am using Azure functions Version 1 with a consumption service plan.
CentralUSPlan (Consumption: 0 Small)
My host.json file is empty ==> {}
Why is this happening? Is there some setting that is required to get azure functions to execute in parallel?
public async Task<List<MyAnalysisObject>> DoMyAnalysisObjectsHttpRequestsAsync(List<MyAnalysisObject> myAnalysisObjectList)
{
List<MyAnalysisObject> evaluatedObjects = new List<MyAnalysisObject>();
using (var client = new HttpClient())
{
var tasks = new List<Task<MyAnalysisObject>>();
foreach (var myAnalysisObject in myAnalysisObjectList)
{
tasks.Add(DoMyAnalysisObjectHttpRequestAsync(client, myAnalysisObject));
}
var evaluatedObjectsArray = await Task.WhenAll(tasks);
evaluatedObjects.AddRange(evaluatedObjectsArray);
}
return evaluatedObjects;
}
public async Task<MyAnalysisObject> DoMyAnalysisObjectHttpRequestAsync(HttpClient client, MyAnalysisObject myAnalysisObject)
{
string requestJson = JsonConvert.SerializeObject(myAnalysisObject);
Console.WriteLine("Doing post-async:" + myAnalysisObject.Identifier);
var response = await client.PostAsync(
"https://myfunctionapp.azurewebsites.net/api/BuildMyAnalysisObject?code=XXX",
new StringContent(requestJson, Encoding.UTF8, "application/json")
);
Console.WriteLine("Finished post-async:" + myAnalysisObject.Identifier);
var result = await response.Content.ReadAsStringAsync();
Console.WriteLine("Got result:" + myAnalysisObject.Identifier);
return JsonConvert.DeserializeObject<MyAnalysisObject>(result);
}

Related

Issue With HttpClient Bulk Parallel Request in .Net Core C#

So I have been struggling with this issue for like 3 weeks. Here's what I want to do.
So I have like 2000 stock options. I want to fetch 5 of them at a time and process but it all has to be parallel. I'll write them in steps to make it more clear.
Get 5 stock symbols from an array
Send it to fetch its data and process. Don't wait for a response keep on processing.
wait 2.6 seconds (as we are limited to 120 API requests per minute so this delay helps in keeping it throttled to 115 per minute)
Goto 1
All the steps above have to be parallel. I have written the code for it and it all seems to be working fine but randomly it crashes saying
"A connection attempt failed because the connected party did not
properly respond after a period of time, or established connection
failed because connected host has failed to respond".
And sometimes it'll never happen and everything works like a charm.
This error is very random. It could show up on maybe 57th stock or maybe at 1829th stock. I have used HttpClient for it. I have tested this same scenario using Angular and creating custom requests and it never crashes there so it's not third-party server's fault.
What I have already done:
Changed HttpClient class usage from new instances every time to a single instance for the whole project.
Increases Service point manager Connection limit to a different number. (Default for .net core is 2)
Instead of HttpClient Queuing I have used SemaphoreSlim for queue and short-circuiting.
Forced ConnectionLeaseTimeout to 40 seconds to detect DNS changes if any.
Changed Async Tasks to threading.
Tried almost everything from the internet.
My doubts:
I doubt that it has something to do with the HttpClient class. I have read a lot of bad things about its misleading documentation etc.
My friend's doubt:
He said it could be because of concurrent tasks and I should change it to threads.
Here's the code:
// Inside Class Constructor
private readonly HttpClient HttpClient = new HttpClient();
SetMaxConcurrency(ApiBaseUrl, maxConcurrentRequests);
// SetMaxConcurrency function
private void SetMaxConcurrency(string url, int maxConcurrentRequests)
{
ServicePointManager.FindServicePoint(new Uri(url)).ConnectionLimit = maxConcurrentRequests;
ServicePointManager.FindServicePoint(new Uri(url)).ConnectionLeaseTimeout = 40*1000;
}
// code for looping through chunks of symbol each chunk has 5 symbols/stocks in it
foreach(var chunkedSymbol in chunkedSymbols)
{
//getting o auth token
string AuthToken = await OAuth();
if(String.IsNullOrEmpty(AuthToken))
{
throw new ArgumentNullException("Access Token is null!");
}
processingSymbols += chunkSize;
OptionChainReq.symbol = chunkedSymbol.ToArray();
async Task func()
{
//function that makes request
var response = await GetOptionChain(AuthToken, ClientId, OptionChainReq);
// concat the result in main list
appResponses = appResponses.Concat(response).ToList();
}
// if request reaches 115 process the remaning requests first
if(processingSymbols >= 115)
{
await Task.WhenAll(tasks);
processingSymbols = 0;
}
tasks.Add(func());
// 2600 millisecond delay to wait for all the data to process
await Task.Delay(delay);
}
//once the loop is completed process the remaining requests
await Task.WhenAll(tasks);
// This code processes every symbol. this code is inside GetOptionChain()
try{
var tasks = new List<Task>();
foreach (string symbol in OptionChainReq.symbol)
{
List<KeyValuePair<string, string>> Params = new List<KeyValuePair<string, string>>();
string requestParams = string.Empty;
// Converting Request Params to Key Value Pair.
Params.Add(new KeyValuePair<string, string>("apikey" , ClientId));
// URL Request Query parameters.
requestParams = new FormUrlEncodedContent(Params).ReadAsStringAsync().Result;
string endpoint = ApiBaseUrl + "/marketdata/chains?";
HttpClient.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", OAuthToken);
Uri tosUri = new Uri(endpoint + requestParams, UriKind.Absolute);
async Task func()
{
try{
string responseString = await GetTosData(tosUri);
OptionChainResponse OptionChainRes = JsonConvert.DeserializeObject<OptionChainResponse>(responseString);
var mappedOptionAppRes = MapOptionsAppRes( OptionChainRes );
if(mappedOptionAppRes != null)
{
OptionsData.Add( mappedOptionAppRes );
}
}
catch(Exception ex)
{
throw new Exception("Crashed");
}
}
// asyncronusly processing each request
tasks.Add(func());
}
//making sure all 5 requests are processed
await Task.WhenAll(tasks);
}
catch (Exception ex)
{
failedSymbols += " "+ string.Join(",", OptionChainReq.symbol);
}
// The code below is for individual request
public async Task<string> GetTosData(Uri url)
{
try
{
await semaphore.WaitAsync();
if (IsTripped())
{
return UNAVAILABLE;
}
var response = await HttpClient.GetAsync(url);
if(response.StatusCode == System.Net.HttpStatusCode.Unauthorized)
{
string OAuthToken = await OAuth();
HttpClient.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", OAuthToken);
return await GetTosData(url);
}
else if(response.StatusCode != HttpStatusCode.OK)
{
TripCircuit(reason: $"Status not OK. Status={response.StatusCode}");
return UNAVAILABLE;
}
return await response.Content.ReadAsStringAsync();
}
catch(Exception ex) when (ex is OperationCanceledException || ex is TaskCanceledException)
{
Console.WriteLine("Timed out");
TripCircuit(reason: $"Timed out");
return UNAVAILABLE;
}
finally
{
semaphore.Release();
}
}

.Net5 HttpClient concurrency - performance

Created a HttpClient using IHttpClientFactory and send 1000 GET call in parallel to WebApi and observed the delay of about 3-5mins for each request.. once this is completed after this again send 1000 GET requests in parallel, this time there was no delay.
Now I increased the parallel request to 2000, for the first batch, each request delay was about 9-11min. And for the second 2000 parallel requests, for each request delay was ~5min(which in case of 1000 requests there was no delay.)
var client = _clientFactory.CreateClient();
client.BaseAddress = new Uri("http://localhost:5000");
client.Timeout = TimeSpan.FromMinutes(20);
List<Task> _task = new List<Task>();
for (int i = 1; i <= 4000; i++)
{
_task.Add(ExecuteRequest(client, i));
if (i % 2000 == 0)
{
await Task.WhenAll(_task);
_task.Clear();
}
}
private async Task ExecuteRequest(HttpClient client, int requestId)
{
var result = await client.GetAsync($"Performance/{requestId}");
var response = await result.Content.ReadAsStringAsync();
var data = JsonConvert.DeserializeObject<Response>(response);
}
Trying to understand,
how many parallel request does HttpClient supports without delay.
How to improve performance of HttpClient for 2000 or more parallel requests..
how many parallel request does HttpClient supports without delay.
On modern .NET Core platforms, you're limited only by available memory. There's no built-in throttling that's on by default.
How to improve performance of HttpClient for 2000 or more parallel requests.
It sounds like you're being throttled by your server. If you want to test a more scalable server, try running this in your server's startup:
var desiredThreads = 2000;
ThreadPool.GetMaxThreads(out _, out var maxIoThreads);
ThreadPool.SetMaxThreads(desiredThreads, maxIoThreads);
ThreadPool.GetMinThreads(out _, out var minIoThreads);
ThreadPool.SetMinThreads(desiredThreads, minIoThreads);
What you're doing is causing worst-case perf for a "cold" (just newed up or empty connection pool) HttpClient.
When you make a new request, it looks for an open connection in the connection pool. When it doesn't find one, it tries to open up a new connection. By throwing a sudden burst at a cold client, most calls to SendAsync will end up trying to open a new connection.
This is a problem because a request that needs a new connection will require multiple round-trips to the server, whereas a request on an existing connection will only require a single round-trip. It gets even worse if you use HTTPS. You're heavily dependent on your network latency in this case.
If you are just benchmarking, then you'll want to benchmark steady-state performance, not warmup performance. Benchmark.NET should more or less do this for you.
When you have requests that complete reasonably quick, it can be a lot faster to instead limit your initial concurrency to a smaller percentage of your total requests, and slowly ramp up your connection pool size from there. This allows subsequent requests to re-use connections. What you might try is something like below, which will only allow (rough behavior, not a guarantee) 10 new connections to be opened at once:
var sem = new SemaphoreSlim(10);
var client = new HttpClient();
async Task<HttpResponseMessage> MakeRequestAsync(HttpRequestMessage req)
{
Task t = sem.WaitAsync();
bool openNew = t.IsCompleted;
await t;
try
{
return await client.SendAsync(req);
}
finally
{
sem.Release(openNew ? 2 : 1);
}
}

HttpClient async requests failing

I need to fetch content from some 3000 urls. I'm using HttpClient, create Task for each url, add tasks to list and then await Task.WhenAll. Something like this
var tasks = new List<Task<string>>();
foreach (var url in urls) {
var task = Task.Run(() => httpClient.GetStringAsync(url));
tasks.Add(task);
}
var t = Task.WhenAll(tasks);
However many tasks end up in Faulted or Canceled states. I thought it might be problem with the concrete urls, but no. I can fetch those url no problem with curl in parallel.
I tried HttpClientHandler, WinHttpHandler with various timeouts etc. Always several hundred urls end with an error.
Then I tried to fetch those urls in batches of 10 and that works. No errors, but very slow. Curl will fetch 3000 urls in parallel very fast.
Then I tried to get httpbin.org 3000 times to verify that the issue is not with my particular urls:
var handler = new HttpClientHandler() { MaxConnectionsPerServer = 5000 };
var httpClient = new HttpClient(handler);
var tasks = new List<Task<HttpResponseMessage>>();
foreach (var _ in Enumerable.Range(1, 3000)) {
var task = Task.Run(() => httpClient.GetAsync("http://httpbin.org"));
tasks.Add(task);
}
var t = Task.WhenAll(tasks);
try { await t.ConfigureAwait(false); } catch { }
int ok = 0, faulted = 0, cancelled = 0;
foreach (var task in tasks) {
switch (task.Status) {
case TaskStatus.RanToCompletion: ok++; break;
case TaskStatus.Faulted: faulted++; break;
case TaskStatus.Canceled: cancelled++; break;
}
}
Console.WriteLine($"RanToCompletion: {ok} Faulted: {faulted} Canceled: {cancelled}");
Again, always several hundred Tasks end in error.
So, what is the issue here? Why I cannot get those urls with async?
I'm using .NET Core and therefore the suggestion to use ServicePointManager (Trying to run multiple HTTP requests in parallel, but being limited by Windows (registry)) is not applicable.
Also, the urls I need to fetch point to different hosts. The code with httpbin is just a test, to show that the problem was not with my urls being invalid.
As Fildor said in the comments, httpClient.GetStringAsync returns Task. So you don't need to wrap it in Task.Run.
I ran this code in the console app. It took 50 seconds to complete. In your comment, you wrote that curl performs 3000 queries in less than a minute - the same thing.
var httpClient = new HttpClient();
var tasks = new List<Task<string>>();
var sw = Stopwatch.StartNew();
for (int i = 0; i < 3000; i++)
{
var task = httpClient.GetStringAsync("http://httpbin.org");
tasks.Add(task);
}
Task.WaitAll(tasks.ToArray());
sw.Stop();
Console.WriteLine(sw.Elapsed);
Console.WriteLine(tasks.All(t => t.IsCompleted));
Also, all requests were completed successfully.
In your code, you are waiting for tasks started using Task.Run. But you need to wait for the completion of tasks started by calling httpClient.Get...

Azure ML web service times out

I have created a simple experiment in Azure ML and trigger it with an http client. In Azure ML workspace, everything works ok when executed. However, the experiment times out and fails when I trigger the experiment using an http client. Setting a timeout value for the http client does not seem to work.
Is there any way we can set this timeout value so that the experiment does not fail?
Make sure you're setting the client timeout value correctly. If the server powering the web service times out, then it will send back a response with the HTTP status code 504 BackendScoreTimeout (or possibly 409 GatewayTimeout). However, if you simply never receive a response, then your client isn't waiting long enough.
You can find out a good amount of time by running your experiment in ML Studio. Go to the experiment properties to find out how long it ran for, and then aim for about twice that amount of time as a timeout value.
I've had similar problems with an Azure ML experiment published as a web service. Most of the times it was running ok, while sometimes it returned with a timeout error. The problem is that the experiment itself has a 90 seconds running time limit. So, most probably your experiment has a running time over this limit and returns with a timeout error. hth
Looks like it isn't possible to set this timeout based on a feature request that is still marked as "planned" as of 4/1/2018.
The recommendation from MSDN forums from 2017 is to use the Batch Execution Service, which starts the machine learning experiment and then asynchronously asks whether it's done.
Here's a code snippet from the Azure ML Web Services Management Sample Code (all comments are from their sample code):
using (HttpClient client = new HttpClient())
{
var request = new BatchExecutionRequest()
{
Outputs = new Dictionary<string, AzureBlobDataReference> () {
{
"output",
new AzureBlobDataReference()
{
ConnectionString = storageConnectionString,
RelativeLocation = string.Format("{0}/outputresults.file_extension", StorageContainerName) /*Replace this with the location you would like to use for your output file, and valid file extension (usually .csv for scoring results, or .ilearner for trained models)*/
}
},
},
GlobalParameters = new Dictionary<string, string>() {
}
};
client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", apiKey);
// WARNING: The 'await' statement below can result in a deadlock
// if you are calling this code from the UI thread of an ASP.Net application.
// One way to address this would be to call ConfigureAwait(false)
// so that the execution does not attempt to resume on the original context.
// For instance, replace code such as:
// result = await DoSomeTask()
// with the following:
// result = await DoSomeTask().ConfigureAwait(false)
Console.WriteLine("Submitting the job...");
// submit the job
var response = await client.PostAsJsonAsync(BaseUrl + "?api-version=2.0", request);
if (!response.IsSuccessStatusCode)
{
await WriteFailedResponse(response);
return;
}
string jobId = await response.Content.ReadAsAsync<string>();
Console.WriteLine(string.Format("Job ID: {0}", jobId));
// start the job
Console.WriteLine("Starting the job...");
response = await client.PostAsync(BaseUrl + "/" + jobId + "/start?api-version=2.0", null);
if (!response.IsSuccessStatusCode)
{
await WriteFailedResponse(response);
return;
}
string jobLocation = BaseUrl + "/" + jobId + "?api-version=2.0";
Stopwatch watch = Stopwatch.StartNew();
bool done = false;
while (!done)
{
Console.WriteLine("Checking the job status...");
response = await client.GetAsync(jobLocation);
if (!response.IsSuccessStatusCode)
{
await WriteFailedResponse(response);
return;
}
BatchScoreStatus status = await response.Content.ReadAsAsync<BatchScoreStatus>();
if (watch.ElapsedMilliseconds > TimeOutInMilliseconds)
{
done = true;
Console.WriteLine(string.Format("Timed out. Deleting job {0} ...", jobId));
await client.DeleteAsync(jobLocation);
}
switch (status.StatusCode) {
case BatchScoreStatusCode.NotStarted:
Console.WriteLine(string.Format("Job {0} not yet started...", jobId));
break;
case BatchScoreStatusCode.Running:
Console.WriteLine(string.Format("Job {0} running...", jobId));
break;
case BatchScoreStatusCode.Failed:
Console.WriteLine(string.Format("Job {0} failed!", jobId));
Console.WriteLine(string.Format("Error details: {0}", status.Details));
done = true;
break;
case BatchScoreStatusCode.Cancelled:
Console.WriteLine(string.Format("Job {0} cancelled!", jobId));
done = true;
break;
case BatchScoreStatusCode.Finished:
done = true;
Console.WriteLine(string.Format("Job {0} finished!", jobId));
ProcessResults(status);
break;
}
if (!done) {
Thread.Sleep(1000); // Wait one second
}
}
}

.NET HttpClient.PostAsync() slow after 3 requests

I am using the .NET 4.5 HttpClient class to make a POST request to a server a number of times. The first 3 calls run quickly, but the fourth time a call to await client.PostAsync(...) is made, it hangs for several seconds before returning the expected response.
using (HttpClient client = new HttpClient())
{
// Prepare query
StringBuilder queryBuilder = new StringBuilder();
queryBuilder.Append("?arg=value");
// Send query
using (var result = await client.PostAsync(BaseUrl + queryBuilder.ToString(),
new StreamContent(streamData)))
{
Stream stream = await result.Content.ReadAsStreamAsync();
return new MyResult(stream);
}
}
The server code is shown below:
HttpListener listener;
void Run()
{
listener.Start();
ThreadPool.QueueUserWorkItem((o) =>
{
while (listener.IsListening)
{
ThreadPool.QueueUserWorkItem((c) =>
{
var context = c as HttpListenerContext;
try
{
// Handle request
}
finally
{
// Always close the stream
context.Response.OutputStream.Close();
}
}, listener.GetContext());
}
});
}
Inserting a debug statement at // Handle request shows that the server code doesn't seem to receive the request as soon as it is sent.
I have already investigated whether it could be a problem with the client not closing the response, meaning that the number of connections the ServicePoint provider allows could be reached. However, I have tried increasing ServicePointManager.MaxServicePoints but this has no effect at all.
I also found this similar question:
.NET HttpClient hangs after several requests (unless Fiddler is active)
I don't believe this is the problem with my code - even changing my code to exactly what is given there didn't fix the problem.
The problem was that there were too many Task instances scheduled to run.
Changing some of the Task.Factory.StartNew calls in my program for tasks which ran for a long time to use the TaskCreationOptions.LongRunning option fixed this. It appears that the task scheduler was waiting for other tasks to finish before it scheduled the request to the server.

Categories