C# await TcpClient ConnectAsync/ReadToEndAsync with timeout

C# await TcpClient ConnectAsync/ReadToEndAsync with timeout - c#

I'm building a SOCKS proxy checker using .NET 4.5 and everything works fine except when one of SOCKS proxies is really slow and it takes over 100 seconds to respond. I'd like to timeout those proxies at few stages (ConnectAsync, ReadToEndAsync) especially at ReadToEndAsync because if proxy is slow it hangs.
I've tried everything I was able to find about this, using Cancellation tokens, Task.Wait, NetworkStream.ReadTimeout ( doesn't work.. strange )..
and if I use Task.Wait then I can't use await keyword which makes it synchronous and not async and that beats the whole idea of my tool..
var socksClient = new Socks5ProxyClient(IP,Port);
var googleAddress = await Dns.GetHostAddressesAsync("google.com");
var speedStopwatch = Stopwatch.StartNew();
using(var socksTcpClient = await socksClient.CreateConnection(googleAddress[0].ToString(),80))
{
if(socksTcpClient.Connected)
{
using(var socksTcpStream = socksTcpClient.GetStream())
{
socksTcpStream.ReadTimeout = 5000;
socksTcpStream.WriteTimeout = 5000; //these don't work..
using (var writer = new StreamWriter(socksTcpStream))
{
await writer.WriteAsync("GET / HTTP/1.1\r\nHost: google.com\r\n\r\n");
await writer.FlushAsync();
using (var reader = new StreamReader(socksTcpStream))
{
var result = await reader.ReadToEndAsync(); // up to 250 seconds hang on thread that is checking current proxy..
reader.Close();
writer.Close();
socksTcpStream.Close();
}
}
}
}
}

Shamefully, async socket IO does not support timeouts. You need to build that yourself. Here is the best approach I know:
Make your entire function not care about timeouts. Disable all of them. Then, start a delay task and when it completes dispose of the socket. This kills all IO that is in flight and effects immediate cancellation.
So you could do:
Task.Delay(TimeSpan.FromSeconds(100)).ContinueWith(_ => socksTcpClient.Dispose());
This leads to an ugly ObjectDisposedException. This is unavoidable.
Probably, you need to cancel the delay in case of success. Otherwise you keep a ton of delay tasks for 100 seconds and they might amount to millions depending on load.

Related

.Net5 HttpClient concurrency - performance

Created a HttpClient using IHttpClientFactory and send 1000 GET call in parallel to WebApi and observed the delay of about 3-5mins for each request.. once this is completed after this again send 1000 GET requests in parallel, this time there was no delay.
Now I increased the parallel request to 2000, for the first batch, each request delay was about 9-11min. And for the second 2000 parallel requests, for each request delay was ~5min(which in case of 1000 requests there was no delay.)
var client = _clientFactory.CreateClient();
client.BaseAddress = new Uri("http://localhost:5000");
client.Timeout = TimeSpan.FromMinutes(20);
List<Task> _task = new List<Task>();
for (int i = 1; i <= 4000; i++)
{
_task.Add(ExecuteRequest(client, i));
if (i % 2000 == 0)
{
await Task.WhenAll(_task);
_task.Clear();
}
}
private async Task ExecuteRequest(HttpClient client, int requestId)
{
var result = await client.GetAsync($"Performance/{requestId}");
var response = await result.Content.ReadAsStringAsync();
var data = JsonConvert.DeserializeObject<Response>(response);
}
Trying to understand,
how many parallel request does HttpClient supports without delay.
How to improve performance of HttpClient for 2000 or more parallel requests..

how many parallel request does HttpClient supports without delay.
On modern .NET Core platforms, you're limited only by available memory. There's no built-in throttling that's on by default.
How to improve performance of HttpClient for 2000 or more parallel requests.
It sounds like you're being throttled by your server. If you want to test a more scalable server, try running this in your server's startup:
var desiredThreads = 2000;
ThreadPool.GetMaxThreads(out _, out var maxIoThreads);
ThreadPool.SetMaxThreads(desiredThreads, maxIoThreads);
ThreadPool.GetMinThreads(out _, out var minIoThreads);
ThreadPool.SetMinThreads(desiredThreads, minIoThreads);

What you're doing is causing worst-case perf for a "cold" (just newed up or empty connection pool) HttpClient.
When you make a new request, it looks for an open connection in the connection pool. When it doesn't find one, it tries to open up a new connection. By throwing a sudden burst at a cold client, most calls to SendAsync will end up trying to open a new connection.
This is a problem because a request that needs a new connection will require multiple round-trips to the server, whereas a request on an existing connection will only require a single round-trip. It gets even worse if you use HTTPS. You're heavily dependent on your network latency in this case.
If you are just benchmarking, then you'll want to benchmark steady-state performance, not warmup performance. Benchmark.NET should more or less do this for you.
When you have requests that complete reasonably quick, it can be a lot faster to instead limit your initial concurrency to a smaller percentage of your total requests, and slowly ramp up your connection pool size from there. This allows subsequent requests to re-use connections. What you might try is something like below, which will only allow (rough behavior, not a guarantee) 10 new connections to be opened at once:
var sem = new SemaphoreSlim(10);
var client = new HttpClient();
async Task<HttpResponseMessage> MakeRequestAsync(HttpRequestMessage req)
{
Task t = sem.WaitAsync();
bool openNew = t.IsCompleted;
await t;
try
{
return await client.SendAsync(req);
}
finally
{
sem.Release(openNew ? 2 : 1);
}
}

Speed up multiple API calls

So I did this project in uni that I am trying to refactor. One of the problems I am having is my method for getting the top list which consist of around 250 movies, e.g. 250 API calls. After that I render them all on my web page. The API I am using is OMDBAPI and I am getting every movie individually as you can see in the code below.
Basically that the web page does is as default loads 10 movies but I can also load in all movies which is around 250.
I am trying to wrap my head around asynchronous programming. So basically it is taking around 4-6 seconds to process this method according to stopwatch in C# but I believe it should be possible to refactor and refine. I am new to asynchronous programming and I have tried looking at MSFT documentation and several issues before here on SO, but I am not getting anywhere with speeding up the calls.
I have looked at using parallel for the issue but I think my problem should be solvable with async?
With stopwatch in C# I have pinpointed the delay to come mostly from between the two x.
I would foremost like to speed up the calls but I would love tips on best practice with async programming as well.
public async Task<List<HomeTopListMovieDTO>> GetTopListAggregatedData(Parameter parameter)
{
List<Task<HomeTopListMovieDTO>> tasks = new List<Task<HomeTopListMovieDTO>>();
var toplist = await GetToplist(parameter);
//x
foreach (var movie in toplist)
{
tasks.Add(GetTopListMovieDetails(movie.ImdbID));
}
var results = Task.WhenAll(tasks);
//x
var tempToplist = toplist.ToArray();
for (int i = 0; i < tasks.Count; i++)
{
tasks[i].Result.NumberOfLikes = tempToplist[i].NumberOfLikes;
tasks[i].Result.NumberOfDislikes = tempToplist[i].NumberOfDislikes;
}
List<HomeTopListMovieDTO> toplistMovies = results.Result.ToList();
return toplistMovies;
}
public async Task<HomeTopListMovieDTO> GetTopListMovieDetails(string imdbId)
{
string urlString = baseUrl + "i=" + imdbId + accessKey;
return await apiWebClient.GetAsync<HomeTopListMovieDTO>(urlString);
}
public async Task<T> GetAsync<T>(string urlString)
{
using (HttpClient client = new HttpClient())
{
var response = await client.GetAsync(urlString,
HttpCompletionOption.ResponseHeadersRead);
response.EnsureSuccessStatusCode();
var data = await response.Content.ReadAsStringAsync();
var result = JsonConvert.DeserializeObject<T>(data);
return result;
}
}

You async code looks OKey. I would throttle it to not make more than X parallel requests using Partitioner / Parallel for each instead but approach with WaitAll is also good enough unless you see connection refused because of port exhaustion or API DDOS protection.
You should reuse HttpClient, see more details in
https://www.aspnetmonsters.com/2016/08/2016-08-27-httpclientwrong, so in your case create HttpClient in the root method and pass it as a parameter to your async methods. HttpClient is thread safe, can be used in parallel calls.
You should dispose HttpResponse.

HttpClient async requests failing

I need to fetch content from some 3000 urls. I'm using HttpClient, create Task for each url, add tasks to list and then await Task.WhenAll. Something like this
var tasks = new List<Task<string>>();
foreach (var url in urls) {
var task = Task.Run(() => httpClient.GetStringAsync(url));
tasks.Add(task);
}
var t = Task.WhenAll(tasks);
However many tasks end up in Faulted or Canceled states. I thought it might be problem with the concrete urls, but no. I can fetch those url no problem with curl in parallel.
I tried HttpClientHandler, WinHttpHandler with various timeouts etc. Always several hundred urls end with an error.
Then I tried to fetch those urls in batches of 10 and that works. No errors, but very slow. Curl will fetch 3000 urls in parallel very fast.
Then I tried to get httpbin.org 3000 times to verify that the issue is not with my particular urls:
var handler = new HttpClientHandler() { MaxConnectionsPerServer = 5000 };
var httpClient = new HttpClient(handler);
var tasks = new List<Task<HttpResponseMessage>>();
foreach (var _ in Enumerable.Range(1, 3000)) {
var task = Task.Run(() => httpClient.GetAsync("http://httpbin.org"));
tasks.Add(task);
}
var t = Task.WhenAll(tasks);
try { await t.ConfigureAwait(false); } catch { }
int ok = 0, faulted = 0, cancelled = 0;
foreach (var task in tasks) {
switch (task.Status) {
case TaskStatus.RanToCompletion: ok++; break;
case TaskStatus.Faulted: faulted++; break;
case TaskStatus.Canceled: cancelled++; break;
}
}
Console.WriteLine($"RanToCompletion: {ok} Faulted: {faulted} Canceled: {cancelled}");
Again, always several hundred Tasks end in error.
So, what is the issue here? Why I cannot get those urls with async?
I'm using .NET Core and therefore the suggestion to use ServicePointManager (Trying to run multiple HTTP requests in parallel, but being limited by Windows (registry)) is not applicable.
Also, the urls I need to fetch point to different hosts. The code with httpbin is just a test, to show that the problem was not with my urls being invalid.

As Fildor said in the comments, httpClient.GetStringAsync returns Task. So you don't need to wrap it in Task.Run.
I ran this code in the console app. It took 50 seconds to complete. In your comment, you wrote that curl performs 3000 queries in less than a minute - the same thing.
var httpClient = new HttpClient();
var tasks = new List<Task<string>>();
var sw = Stopwatch.StartNew();
for (int i = 0; i < 3000; i++)
{
var task = httpClient.GetStringAsync("http://httpbin.org");
tasks.Add(task);
}
Task.WaitAll(tasks.ToArray());
sw.Stop();
Console.WriteLine(sw.Elapsed);
Console.WriteLine(tasks.All(t => t.IsCompleted));
Also, all requests were completed successfully.
In your code, you are waiting for tasks started using Task.Run. But you need to wait for the completion of tasks started by calling httpClient.Get...

.NET HttpClient.PostAsync() slow after 3 requests

I am using the .NET 4.5 HttpClient class to make a POST request to a server a number of times. The first 3 calls run quickly, but the fourth time a call to await client.PostAsync(...) is made, it hangs for several seconds before returning the expected response.
using (HttpClient client = new HttpClient())
{
// Prepare query
StringBuilder queryBuilder = new StringBuilder();
queryBuilder.Append("?arg=value");
// Send query
using (var result = await client.PostAsync(BaseUrl + queryBuilder.ToString(),
new StreamContent(streamData)))
{
Stream stream = await result.Content.ReadAsStreamAsync();
return new MyResult(stream);
}
}
The server code is shown below:
HttpListener listener;
void Run()
{
listener.Start();
ThreadPool.QueueUserWorkItem((o) =>
{
while (listener.IsListening)
{
ThreadPool.QueueUserWorkItem((c) =>
{
var context = c as HttpListenerContext;
try
{
// Handle request
}
finally
{
// Always close the stream
context.Response.OutputStream.Close();
}
}, listener.GetContext());
}
});
}
Inserting a debug statement at // Handle request shows that the server code doesn't seem to receive the request as soon as it is sent.
I have already investigated whether it could be a problem with the client not closing the response, meaning that the number of connections the ServicePoint provider allows could be reached. However, I have tried increasing ServicePointManager.MaxServicePoints but this has no effect at all.
I also found this similar question:
.NET HttpClient hangs after several requests (unless Fiddler is active)
I don't believe this is the problem with my code - even changing my code to exactly what is given there didn't fix the problem.

The problem was that there were too many Task instances scheduled to run.
Changing some of the Task.Factory.StartNew calls in my program for tasks which ran for a long time to use the TaskCreationOptions.LongRunning option fixed this. It appears that the task scheduler was waiting for other tasks to finish before it scheduled the request to the server.

Advice on processing giant text file and processing URL's

I'm currently trying to loop through a text file that is about 1.5gb's in size and then use the URL's that are grabbed from it to pull down the html from the site.
For speed I'm trying to process all the HTTP request on a new thread but since C# is not my strongest language but a requirement for what I'm doing I'm a bit confused on good thread practice.
This is how I'm processing the list
private static void Main()
{
const Int32 BufferSize = 128;
using (var fileStream = File.OpenRead("dump.txt"))
using (var streamReader = new StreamReader(fileStream, Encoding.UTF8, true, BufferSize))
{
String line;
var progress = 0;
while ((line = streamReader.ReadLine()) != null)
{
var stuff = line.Split('|');
getHTML(stuff[3]);
progress += 1;
Console.WriteLine(progress);
}
}
}
And I'm pulling down the HTML as so
private static void getHTML(String url)
{
new Thread(() =>
{
var client = new DecompressGzipResponse();
var html = client.DownloadString(url);
}).Start();
}
Though the speeds are fast doing this initially, after about 20 thousand they slow down and eventually after 32 thousand the application will hang and crash. I was under the impression C# threads terminated when the function completed?
Can anyone give any examples/ suggestions on how to do this better?

One very reliable way to do this is by using the producer-consumer pattern. You create a thread-safe queue of URLs (for example, BlockingCollection<Uri>). Your main thread is the producer, which adds items to the queue. You then have multiple consumer threads, each of which reads Urls from the queue and does the HTTP requests. See BlockingCollection.
Setting it up isn't terribly difficult:
BlockingCollection<Uri> UrlQueue = new BlockingCollection<Uri>();
// Main thread starts the consumer threads
Task t1 = Task.Factory.StartNew(() => ProcessUrls, TaskCreationOptions.LongRunning);
Task t2 = Task.Factory.StartNew(() => ProcessUrls, TaskCreationOptions.LongRunning);
// create more tasks if you think necessary.
// Now read your file
foreach (var line in File.ReadLines(inputFileName))
{
var theUri = ExtractUriFromLine(line);
UrlQueue.Add(theUri);
}
// when done adding lines to the queue, mark the queue as complete
UrlQueue.CompleteAdding();
// now wait for the tasks to complete.
t1.Wait();
t2.Wait();
// You could also use Task.WaitAll if you have an array of tasks
The individual threads process the urls with this method:
void ProcessUrls()
{
foreach (var uri in UrlQueue.GetConsumingEnumerable())
{
// code here to do a web request on that url
}
}
That's a simple and reliable way to do things, but it's not especially quick. You can do much better by using a second queue of WebCient objects that make asynchronous requests For example, say you want to have 15 asynchronous requests. You start the same way with a BlockingCollection, but you only have one persistent consumer thread.
const int MaxRequests = 15;
BlockingCollection<WebClient> Clients = new BlockingCollection<WebClient>();
// start a single consumer thread
var ProcessingThread = Task.Factory.StartNew(() => ProcessUrls, TaskCreationOptions.LongRunning);
// Create the WebClient objects and add them to the queue
for (var i = 0; i < MaxRequests; ++i)
{
var client = new WebClient();
// Add an event handler for the DownloadDataCompleted event
client.DownloadDataCompleted += DownloadDataCompletedHandler;
// And add this client to the queue
Clients.Add(client);
}
// add the code from above that reads the file and populates the queue
Your processing function is somewhat different:
void ProcessUrls()
{
foreach (var uri in UrlQueue.GetConsumingEnumerable())
{
// Wait for an available client
var client = Clients.Take();
// and make an asynchronous request
client.DownloadDataAsync(uri, client);
}
// When the queue is empty, you need to wait for all of the
// clients to complete their requests.
// You know they're all done when you dequeue all of them.
for (int i = 0; i < MaxRequests; ++i)
{
var client = Clients.Take();
client.Dispose();
}
}
Your DownloadDataCompleted event handler does something with the data that was downloaded, and then adds the WebClient instance back to the queue of clients.
void DownloadDataCompleteHandler(Object sender, DownloadDataCompletedEventArgs e)
{
// The data downloaded is in e.Result
// be sure to check the e.Error and e.Cancelled values to determine if an error occurred
// do something with the data
// And then add the client back to the queue
WebClient client = (WebClient)e.UserState;
Clients.Add(client);
}
This should keep you going with 15 concurrent requests, which is about all you can do without getting a bit more complicated. Your system can likely handle many more concurrent requests, but the way that WebClient starts asynchronous requests requires some synchronous work up front, and that overhead makes 15 about the maximum number you can handle.
You might be able to have multiple threads initiating the asynchronous requests. In that case, you could potentially have as many threads as you have processor cores. So on a quad core machine, you could have the main thread and three consumer threads. With three consumer threads this technique could give you 45 concurrent requests. I'm not certain that it scales that well, but it might be worth a try.
There are ways to have hundreds of concurrent requests, but they're quite a bit more complicated to implement.

You need thread management.
My advice is to use Tasks instead of creating your own Threads.
By using the Task Parallel Library, you let the runtime deal with the thread management. By default, it will allocate your tasks on threads from the ThreadPool, and will allow a level of concurrency which is contingent on the number of CPU cores you have. It will also reuse existing Threads when they become available instead of wasting time creating new ones.
If you want to get more advanced, you can create your own task scheduler to manage the scheduling aspect yourself.
See also What is difference between Task and Thread?

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# await TcpClient ConnectAsync/ReadToEndAsync with timeout - c#

Related

.Net5 HttpClient concurrency - performance

Speed up multiple API calls

HttpClient async requests failing

.NET HttpClient.PostAsync() slow after 3 requests

Advice on processing giant text file and processing URL's

Categories

Resources