HttpClient DocumentClient What Happens With Different Threads. Azure Functons - c#

I read in an article about HttpClient/DocumentClient that it can be best practice to create a singleton for apps and inject it into object so that underlying resources are not exhausted by continued re-creating.How does this work? If an HttpClient is being accessed by various threads and making simultaneous calls to possibly different endpoints I can't see how this can work.
I read this
https://medium.com/#nuno.caneco/c-httpclient-should-not-be-disposed-or-should-it-45d2a8f568bc
with interest. If I have an Azure Function making use of a DocumentClient calling cosmosDb how should I use the DocumentClient? Should it be a static instance?
I have my Azure function set up like this. I presume a new instance of DocmentClient is being created with every request which under high load could cause resource problems.
[FunctionName("MyGetFunc")]
public static async Task<IActionResult> Run(
[HttpTrigger(AuthorizationLevel.Function, "get", "post", Route = null)] HttpRequest req,
[CosmosDB("ct","ops", ConnectionStringSetting ="cosmosConn")]
DocumentClient docClient,
ILogger log)
//use docClient here...
{

For the DocumentClient part of your question see here: https://learn.microsoft.com/en-us/sandbox/functions-recipes/cosmos-db?tabs=csharp#customize-a-documentclient-and-reuse-it-between-executions
They talk about the different scenarios. So yes, if you have many Function invocations, I would use one static instance - which is also thread-safe.
private static DocumentClient client = GetCustomClient();
private static DocumentClient GetCustomClient()
{
DocumentClient customClient = new DocumentClient(
new Uri(ConfigurationManager.AppSettings["CosmosDBAccountEndpoint"]),
ConfigurationManager.AppSettings["CosmosDBAccountKey"],
new ConnectionPolicy
{
ConnectionMode = ConnectionMode.Direct,
ConnectionProtocol = Protocol.Tcp,
// Customize retry options for Throttled requests
RetryOptions = new RetryOptions()
{
MaxRetryAttemptsOnThrottledRequests = 10,
MaxRetryWaitTimeInSeconds = 30
}
});
// Customize PreferredLocations
customClient.ConnectionPolicy.PreferredLocations.Add(LocationNames.CentralUS);
customClient.ConnectionPolicy.PreferredLocations.Add(LocationNames.NorthEurope);
return customClient;
}
[FunctionName("CosmosDbSample")]
public static async Task<HttpResponseMessage> Run(

If an HttpClient is being accessed by various threads and making simultaneous calls to possibly different endpoints I can't see how this can work.
Why? The HttpClient is thread-safe which means that it can be used from several concurrent threads simultaneously.
Is HttpClient safe to use concurrently?

If you're using .NET Core, please also refer to Use HttpClientFactory to implement resilient HTTP requests.
HttpClient is intended to be instantiated once and reused throughout the life of an application. Instantiating an HttpClient class for every request will exhaust the number of sockets available under heavy loads. That issue will result in SocketException errors. Possible approaches to solve that problem are based on the creation of the HttpClient object as singleton or static, as explained in this Microsoft article on HttpClient usage.
But there’s a second issue with HttpClient that you can have when you use it as singleton or static object. In this case, a singleton or static HttpClient doesn't respect DNS changes, as explained in this issue at the .NET Core GitHub repo.
To address those mentioned issues and make the management of HttpClient instances easier, .NET Core 2.1 introduced a new HttpClientFactory that can also be used to implement resilient HTTP calls by integrating Polly with it.

[CosmosDB("ct","ops", ConnectionStringSetting ="cosmosConn")]
DocumentClient docClient,
That is using the Cosmos DB Binding. The Binding does not create multiple instances of the DocumentClient, it will create one and reuse it in all executions.
You can check the source code here: https://github.com/Azure/azure-webjobs-sdk-extensions/blob/dev/src/WebJobs.Extensions.CosmosDB/Bindings/CosmosDBClientBuilder.cs.
It calls GetService and obtains the DocumentClient instance for that particular connection string if one already was created in a previous execution.
Similarly to maintaining your own static/Lazy DocumentClient (see https://learn.microsoft.com/en-us/azure/azure-functions/manage-connections#documentclient-code-example-c).

Related

Reuse httpClient created via the HttpClientFactory in different methods of the same class? (C# / .NET)

I have a class into which the IHttpClientFactory is injected via the constructor. There's also a HttpClient private field in this class.
Are there any issues with creating the HttpClient in the constructor, using the factory, and then reusing that HttpClient in two/multiple methods within that one class to make two/multiple different api calls? (Same Api, different endpoints)
Or would it be better to use the factory in each method to create a new client. What are the implications/pros & cons of each approach? Is any one inherently better or doesn't it matter?
private readonly HttpClient _httpClient;
public RestClient(IHttpClientFactory httpClientFactory)
{
_httpClient = httpClientFactory.CreateClient();
}
public async Task<SomeResponse> Method1(SomeRequest request)
{
...
using (var httpRequestMessage = new HttpRequestMessage(HttpMethod.Post, url))
{
httpRequestMessage.Headers.Add("Accept", "application/json");
httpRequestMessage.Headers.Add("Authorization", "Basic " + credentials);
httpRequestMessage.Content = new StringContent(jsonBody, Encoding.UTF8, "application/json");
using (var response = await _httpClient.SendAsync(httpRequestMessage))
{
...
}
}
...
}
public async Task<SomeOtherResponse> Method2(someInput)
{
...
using (var httpRequestMessage = new HttpRequestMessage(HttpMethod.Get, uri.ToString()))
{
httpRequestMessage.Headers.Add("Accept", "image/png");
httpRequestMessage.Headers.Add("Authorization", "Basic " + credentials);
using (var response = await _httpClient.SendAsync(httpRequestMessage))
{
...
}
}
...
}
Edit: have looked at this post Should I cache and reuse HttpClient created from HttpClientFactory? but it doesn't answer my questions. If there is something to be derived from there please explain.
I think you are looking for this guidance from Microsoft: Guidelines for using HttpClient
I copy here the related part
Recommended use
In .NET Core and .NET 5+:
Use a static or singleton HttpClient instance with PooledConnectionLifetime set to the desired interval, such as two minutes, depending on expected DNS changes. This solves both the socket exhaustion and DNS changes problems without adding the overhead of IHttpClientFactory. If you need to be able to mock your handler, you can register it separately.
Using IHttpClientFactory, you can have multiple, differently configured clients for different use cases. However, be aware that the factory-created clients are intended to be short-lived, and once the client is created, the factory no longer has control over it.
The factory pools HttpMessageHandler instances, and, if its lifetime hasn't expired, a handler can be reused from the pool when the factory creates a new HttpClient instance. This reuse avoids any socket exhaustion issues.
If you desire the configurability that IHttpClientFactory provides, we recommend using the typed-client approach.
In .NET Framework:
Use IHttpClientFactory to manage your HttpClient instances. If you create a new client instance for each request, you can exhaust available sockets.
Tip
If your app requires cookies, consider disabling automatic cookie handling or avoiding IHttpClientFactory. Pooling the HttpMessageHandler instances results in sharing of CookieContainer objects. Unanticipated CookieContainer object sharing often results in incorrect code.

Post HTTP request without awaiting the result

I have the following endpoint:
[HttpPost("Submit")]
public String post()
{
_ = _service.SubmitMetric("test", MetricType.Count, 60, 1);
return "done";
}
And the service implementation:
public Task<HttpResponseMessage> SubmitMetric(<params>)
{
// build payload
using (var httpClient = new HttpClient())
{
return httpClient.PostAsync(<params>);
}
}
When I run the code and call the endpoint, the HTTP POST is not triggered. However, if I change my code to:
public async Task<HttpResponseMessage> SubmitMetric(<params>)
{
// build payload
using (var httpClient = new HttpClient())
{
return await httpClient.PostAsync(<params>);
}
}
the POST is submitted as expected. Why is that happening, and what can I do if I don't really care about the HTTP response? I just want to submit it and continue my flow. Shouldn't I be able to use it without awaiting the result? For example:
public void SubmitMetric(<params>)
{
// build payload
using (var httpClient = new HttpClient())
{
httpClient.PostAsync(<params>);
}
}
There are two problems with this code. If either was fixed, there would be no problem:
The HttpClient is used incorrectly. An HttpClient object is thread-safe and meant to be reused, not disposed. Disposing it like this leaks sockets and can result in application crashes or worse, instability. An HttpClient resolves the URL's Host to a socket and caches that socket. The OS also caches opened sockets because opening them is expensive. They're kept alive for a while even if an application closes them because some packets may still be in transit
By not awaiting PostAsync execution exits the using block and the HttpClient instance is disposed before the request had a chance to even start.
In any case, making a POST doesn't take long so there's no need to make the method fire-and-forget. Besides, few applications are OK with losing metrics, especially when things go wrong. That's when metrics are most useful.
Which is why ASP.NET Core 6 adds built-in support for OpenTelemetry tracing and metrics. More on that at the end, but the supporting packages can be used in ASP.NET Framework as well. You may be able to replace your current service with a built-in one.
Use await - not enough
One way to fix this is to use await but that doesn't solve the HttpClient usage problem.
public async Task<HttpResponseMessage> SubmitMetric(<params>)
{
// build payload
using (var httpClient = new HttpClient())
{
return await httpClient.PostAsync(<params>);
}
}
At the very least the HttpClient should be stored in a field. Once that's done though, there's no longer any reason to await, provided the service itself is still around :
HttpClient httpClient = new HttpClient();
public Task<HttpResponseMessage> SubmitMetric(<params>)
{
return httpClient.PostAsync(<params>);
}
Long lived services
Which brings us to keeping the service around. In ASP.NET and ASP.NET Core each request is served by a separate thread, in a new instance of the Controller class. The request itself is used as a GC scope so anything created during a request is disposed once this concludes, including the HttpClient instance.
To keep the Metrics service around we need to either register it as Singleton in ASP.NET Core's DI, make it a BackgroundService or ensure it's a singleton in ASP.NET Framework. We could make the field static, but that leads to the next issue.
Proper HttpClient usage
HttpClient can still cause problems if used as a singleton. The HttpClient caches sockets to specific machines. If that machine goes away, the HttpClient will still try to communicate with it causing errors. This can happen easily when the remote services uses a load balancer or fails over to a new server. To fix this, the HttpClient instance or rather the sockets, need to be recycled periodically.
That's the job of the HttpClientFactory. This class caches and recycles SocketClientHandler instances, the classes that do the actual work in an HttpClient. These are recycled periodically, eg every 10 minutes. When asked for a new HttpClient instance, it creates a new instance wrapping one of the already available handlers.
When you use services.AddHttpClient in ASP.NET Core you're actually configuring an HttpClientFactory. When you add an HttpClient dependency in a controller, the instance will be created by the configured HttpClientFactory.
This means that the following action would work properly :
HttpClient _client;
public MyController(HttpClient client)
{
_client=client;
}
[HttpPost("Submit")]
public String post()
{
await _client.PostAsync(<params>);
return "done";
}
A scoped service with an HttpClient dependency would also work:
MyService _service;
public MyController(MyService service)
{
_service=service;
}
HttpPost("Submit")]
public String post()
{
await _service.SubmitMetric("test", MetricType.Count, 60, 1);
return "done";
}
where MyService is :
class MyService
{
HttpClient _client;
public MyService(HttpClient client)
{
_client=client;
}
public Task<HttpResponseMessage> SubmitMetric(<params>)
{
// build payload
return httpClient.PostAsync(<params>);
}
}
In this case there's no real need to await inside SubmitMetric, that's taken care of by the action.
Using the built-in OpenTelemetry tracing and metrics
ASP.NET Core 6, the upcoming Long-Term-Support version, adds native support for the OpenTelemetry standard for logging, tracing and metrics. This allows using a standard API to push metrics to a lot of different observability applications like Prometheus, Jaeger, Zipking, Elastic and Splunk.
Instead of rolling one's own metrics infrastructure it's better to use the standard API. OpenTelemetry for .NET supports this in ASP.NET Framework 4.6 and later. ASP.NET Core 5 and later are instrumented to publish metrics and tracing to OpenTelemetry providers through the built-in System.Diagnostics namespace and the Activity class.
In fact, Controller is already instrumented so you could get rid of the metrics service, adding any Tags and Baggage to the request's current activity:
[HttpPost("Submit")]
public String post()
{
Activity.Current?.AddTag("test");
...
return "done";
}
Metrics were added in ASP.NET Core 6 Preview 5:
Meter meter = new Meter("my.library.meter.name", "v1.0");
Counter<int> _counter;
public MyController(...)
{
_counter = meter.CreateCounter<int>("Requests");
}
[HttpPost("Submit")]
public String post()
{
counter.Add(60, KeyValuePair.Create<string, object>("request", "test"));
return "done";
}
Don't do it. Await for it even though you discard the result.
Fire and forget is an anti pattern and the context that you are performing the request can be invalidated/killed before the request could be completed, terminating the connection. Just await it, and don't do anything with the result.
httpClient will be disposed while the POST operation is running, probably resulting in killing the socket. If you use await, the object will remain inside the using clause while the operation is running, and it won't be terminated before it finishes.
Note that in your current implementation, you're creating a new connection on each API request, which might eventually lead to socket exhaustion. A better approach would be injecting IHttpClientFactory, which manage the lifetime of network connections for you, and reuses connections from the pool:
public class MyService
{
private readonly IHttpClientFactory _httpClient;
public MyService(IHttpClientFactory httpClient)
{
_httpClient = httpClient;
}
public async Task<HttpResponseMessage> SubmitMetric(/*<params>*/)
{
var httpClient = _httpClient.CreateClient();
return await httpClient.PostAsync(/*<params>*/);
}
}
Note: You need to add services.AddHttpClient() in ConfigureServices in your Startup.cs to enable injection.

Why using HttpClient in a using block IS WRONG in WebApi context?

So, the question is why the usage of HttpClient in using block is WRONG, BUT in WebApi context?
I've been reading this article Don't Block on Async Code. In it we have the following example:
public static async Task<JObject> GetJsonAsync(Uri uri)
{
// (real-world code shouldn't use HttpClient in a using block; this is just example code)
using (var client = new HttpClient())
{
var jsonString = await client.GetStringAsync(uri);
return JObject.Parse(jsonString);
}
}
// My "top-level" method.
public class MyController : ApiController
{
public string Get()
{
var jsonTask = GetJsonAsync(...);
return jsonTask.Result.ToString();
}
}
The comment // (real-world code shouldn't use HttpClient in a using block; this is just example code) just triggered me. I've been always using HttpClient in this way.
The next thing I've checked is Microsoft's documentation on HttpClient Class.
In it, we have the following statement with provided source sample:
HttpClient is intended to be instantiated once and re-used throughout
the life of an application. Instantiating an HttpClient class for
every request will exhaust the number of sockets available under heavy
loads. This will result in SocketException errors. Below is an example
using HttpClient correctly.
public class GoodController : ApiController
{
private static readonly HttpClient HttpClient;
static GoodController()
{
HttpClient = new HttpClient();
}
}
So isn't the constructor called on each request and thus a new HttpClient will be created every time?
Thanks!
There's a bit of a long answer to this...
Originally, the official recommendation was to use HttpClient in a using block. But this caused problems at scale, essentially using up lots of connections in the TIME_WAIT state.
So, the official recommendation changed to use a static HttpClient. But this caused problems where it would never correctly handle DNS updates.
So, the ASP.NET team came up with IHttpClientFactory in .NET Core 2.1, so code (or at least code running on modern platforms) can reuse HttpClient instances (or, more properly, the message handlers of those instances), avoiding the TIME_WAIT problem, but also periodically closing those connections to avoid the DNS problem.
But, at the same time, the .NET team came up with SocketsHttpHandler also in .NET Core 2.1, which also does connection pooling.
So, on modern platforms, you can either use IHttpClientFactory or a static/singleton HttpClient. On older platforms (including .NET Framework), you would use a static/singleton HttpClient and either live with the DNS issue or use other workarounds.
Actually writing this question I noticed the static constructor in the code sample provided from Microsoft. This all makes sense now.
The Static Constructors are used to initialize any static data, or to perform a particular action that needs to be performed only once. It is called automatically before the first instance is created or any static members are referenced.
In the context of WebAPI the static constructor is called one time only thus creating only one HttpClient and reusing it for all other requests.
I'll never use using(HttpClient....) in production code again.
This is a great article on the wrong usage of HttpClient - YOU'RE USING HTTPCLIENT WRONG AND IT IS DESTABILIZING YOUR SOFTWARE

Azure Functions: binding to DocumentClient versus static instance - what's recommended?

I know how to bind queries directly to an Azure Function and use Cosmos DB triggers in functions.
However, I'm looking for direction around using DocumentClient (Nuget package Microsoft.Azure.Cosmos) directly.
There's documentation that explains how to reuse a static client instance between executions.
It is also possible to get a DocumentClient instance as a binding by adding [DocumentDB("test", "test", ConnectionStringSetting = "CosmosDB")] DocumentClient client to the function's parameters.
Finally, it is possible to create a DocumentClient instance in the function's body: var client = new DocumentClient(...).
I do not find a clear recommendation when to use what approach except that number 3 never is a good option because of performance, memory usage and connection limits. Also, I understand that using a static instance has advantages.
Questions
Azure functions have a connection limit (discussed here). Does this also apply when using approach 2 (bind to client)?
What are the pros and cons of using approach 2 (binding) versus 1 (static)?
What's the advantage of binding to a SQL query compared to binding to a DocumentClient and creating the query in the function's body?
There is another way to use DocumentClient.
Starting Version 1.0.28 of Microsoft.NET.Sdk.Functions, one can now use a FunctionsStartup class to initialize DocumentClient once, and then register it for DI (dependency injection), and then use the same instance every time.
The FunctionsStartup class is documented here. And a better explanation is here.
In your Startup's configure method, build your client.
using Microsoft.Azure.Functions.Extensions.DependencyInjection;
using Microsoft.Extensions.DependencyInjection;
[assembly: FunctionsStartup(typeof(MyApp.Startup))]
namespace MyApp
{
public class Startup : FunctionsStartup
{
public override void Configure(IFunctionsHostBuilder builder)
{
IDocumentClient client = GetCustomClient();
builder.Services.AddSingleton<IDocumentClient>(client);
}
}
This can be then injected into the function constructor and used by the methods.
public class MyFunction
{
private IDocumentClient _client;
public MyFunction(IDocumentClient client)
{
_client = client;
}
[FunctionName("MyFunction")]
public async Task<IActionResult> Run(
[HttpTrigger(AuthorizationLevel.Function, "get", "post", Route = null)] HttpRequest req,
ILogger log)
{
// use _client here.
}
}
When Azure creates an instance of this class to serve a request, it passes the IDocumentClient instance that was created in FunctionsStartup class.
This strategy allows one to reuse the same instance of DocumentClient. Singeton-ness of this client is not forced by making it static, but by making sure we only create it once. This also helps with testability as tests can inject a different instance of IDocumentClient.
This article makes a good case for a static client.
We all know the woes of this approach for the HttpClient (and if you
don’t, please read it right after this article!), and it has the exact
same effect here: If the Function is getting a high volume of
triggers, we not only will be penalizing the performance of our
database calls with the initialization overhead but the memory
consumption will raise and we might even incur in socket exhaustion
scenarios.
To your questions 2 and 3:
The big pro of using the binding is simplicity. All the creation of the clients etc is abstracted away from you. Con of this is of course control. Here is a good example of using a custom client.
Using the SQL query instead of the DocumentClient is one step further up in regards to abstraction.

Setting up HttpClient in .NET to make it work with multiple threads and to provide concurrency

After reading the posts below about recommended usage of HttpClient, I changed my code from instantiating HttpClient per request within a using block to a long-lived object.
Do HttpClient and HttpClientHandler have to be disposed?
What is the overhead of creating a new HttpClient per call in a WebAPI client?
My implementation is part of a low-level api, and would be to make requests from from different parts of the app running on different threads, so thread-safety and concurrency when making requests needs to be guaranteed as well.
I even went on to make it a singleton as below, so there is just one instance of HttpClient used throughout the app. (fourth version form John SKeet's article)
http://csharpindepth.com/Articles/General/Singleton.aspx
public sealed class MyHttpClient
{
private static readonly volatile HttpClient _myHttpClient = new HttpClient();
static MyHttpClient() {}
private MyHttpClient(){ }
public static HttpClient MyHttpClientObj
{
get
{
return _myHttpClient;
}
}
}
And below is an example of how this gets used
public IEnumerable<string> GetSomeData(string url, FormUrlEncodedContent bodyParameters)
{
try
{
//is it possible to configure timeout here instead, such that every request will have its one timeout duration?
var response = MyHttpClient.MyHttpClientObj.PostAsync(url, bodyParameters);
var result = response.Result;
if (!result.IsSuccessStatusCode)
{
//log and return null
}
var data = JsonConvert.DeserializeObject<List<string>>(result.Content.ReadAsStringAsync().Result);
return data;
}
catch (Exception ex)
{
//logging exceptions
}
}
When making requests via HttpClient, I've made sure to use only the therad-safe methods listed below, but when deserializing the response, result.Content.ReadAsStringAsync().Result is used. This is because the higher level calls don't support async responses yet.
https://msdn.microsoft.com/en-us/library/system.net.http.httpclient(v=vs.110).aspx#Anchor_5
However, I still have a few questions.
Is this approach Thread-safe and stable enough to not cause any memory leaks?
How can I configure the Timeout for every request?
Is it necessary to specify 'Connection: keep-alive' in DefaultHeaders?
How can I add a custom header/modify a default header for every request?
And finally, Are there any known performance issues/drawbacks in using HttpClient this way?
It's thread safe, and recommended.
Depending on your usage, the biggest thing may be to raise the connection limit to your desired level of concurrency:
ServicePointManager.DefaultConnectionLimit = 16;
Without this set, concurrent requests to a single host will sit in a queue until they can be issued. And they'll time out if they're not got to in time.
I'd also recommend using pipelining to improve performance:
new HttpClient(new WebRequestHandler() { AllowPipelining = true });
Yes, this approach is thread-safe, as you call a thread-safe methods and do not you any synchronization logic, so your client threads simply independent from each other.
You can use an overload with CancellationTokenSource, with calling it method CancelAfter, this approach is recommended by MSDN.
No, but if your connection do require some interaction between client and server, it is highly recommended approach for HTTP/1.1, it reduces the overhead to recreating the socket connection and some handshakes between participating sides.
You can use the Headers property of the FormUrlEncodedContent class, simply add a header you need to it.
The huge drawback for your solution is .Result call, as it blocks the current thread. You can try to refactor your approach with TaskCompletionSource usage so you could possibly use async methods internally. This will provide you a possibility for the threads to do something else rather than wait for result.
The biggest performance issue you will encounter using HttpClient in a highly concurrent environment is the number of concurrent connections to any given url is limited to 2 by default. You can increase this for all endpoints using ServicePointManager.DefaultConnectionLimit or get a specific ServicePoint using ServicePointManager.FindServicePoint and set ServicePoint.ConnectionLimit

Categories