Boosting performance on async web calls - c#

Backgound: I must call a web service call 1500 times which takes roughly 1.3 seconds to complete. (No control over this 3rd party API.) total Time = 1500 * 1.3 = 1950 seconds / 60 seconds = 32 minutes roughly.
I came up with what I though was a good solution however it did not pan out that great.
So I changed the calls to async web calls thinking this would dramatically help my results it did not.
Example Code:
Pre-Optimizations:
foreach (var elmKeyDataElementNamed in findResponse.Keys)
{
var getRequest = new ElementMasterGetRequest
{
Key = new elmFullKey
{
CmpCode = CodaServiceSettings.CompanyCode,
Code = elmKeyDataElementNamed.Code,
Level = filterLevel
}
};
ElementMasterGetResponse getResponse;
_elementMasterServiceClient.Get(new MasterOptions(), getRequest, out getResponse);
elementList.Add(new CodaElement { Element = getResponse.Element, SearchCode = filterCode });
}
With Optimizations:
var tasks = findResponse.Keys.Select(elmKeyDataElementNamed => new ElementMasterGetRequest
{
Key = new elmFullKey
{
CmpCode = CodaServiceSettings.CompanyCode,
Code = elmKeyDataElementNamed.Code,
Level = filterLevel
}
}).Select(getRequest => _elementMasterServiceClient.GetAsync(new MasterOptions(), getRequest)).ToList();
Task.WaitAll(tasks.ToArray());
elementList.AddRange(tasks.Select(p => new CodaElement
{
Element = p.Result.GetResponse.Element,
SearchCode = filterCode
}));
Smaller Sampling Example:
So to easily test I did a smaller sampling of 40 records this took 60 seconds with no optimizations with the optimizations it only took 50 seconds. I would have though it would have been closer to 30 or better.
I used wireshark to watch the transactions come through and realized the async way was not sending as fast I assumed it would have.
Async requests captured
Normal no optimization
You can see that the asnyc pushes a few very fast then drops off...
Also note that between requests 10 and 11 it took nearly 3 seconds.
Is the overhead for creating threads for the tasks that slow that it takes seconds?
Note: The tasks I am referring to are the 4.5 TAP task library.
Why wouldn't the request come faster than that.
I was told the Apache web server I was hitting could hold 200 max threads so I don't see an issue there..
Am I not thinking about this clearly?
When calling web services are there little advantages from async requests?
Do I have a code mistake?
Any ideas would be great.

After many days of searching I found this post that solved my problem:
Trying to run multiple HTTP requests in parallel, but being limited by Windows (registry)
The reason that the request was not hitting the server quicker was due too the my client side code and nothing to do with the server. By default C# only allows 2 concurrent requests.
see here: http://msdn.microsoft.com/en-us/library/system.net.servicepointmanager.defaultconnectionlimit.aspx
I simply added this line of code and then all request came through in milliseconds.
System.Net.ServicePointManager.DefaultConnectionLimit = 50;

Related

HttpClient seems to be causing my app to slowdown every ~3 minutes while freeing up a ton of memory

So basically I am running a program which is able to send up to 7,000 HTTP requests every second in average, 24/7, in order to detect last changes on a website as quickly as possible.
However, every 2.5 to 3 minutes in average, my program slowdowns for around 10-15 seconds and goes from ~7K rq/s to less than 1000.
Here are logs from my program, where you can see the amount of requests it sends every second:
https://pastebin.com/029VLxZG
When scrolling down through the logs, you can see it goes slower every ~3 minutes. Example: https://i.imgur.com/US0wPzm.jpeg
At first I thought it was my server's ethernet connection going in a temporary "restricted" mode, and I even tried contacting my host about it. But then I ran 2 instances of my program simulteanously just to see what would happen and I noticed that, even though the issue (downtime) was happening on both, it wasn't always happening at the same time (depending on when the program was started, if you get what I mean), which meant the problem wasn't coming from the internet connection, but my program itself.
I investigated a little bit more, and found out that as soon as my program goes from ~7K rq/s to ~700, a lot of RAM is being freed up on my server.
I have taken 2 screenshots of the consecutive seconds before and once the downtime occurs (including RAM metrics), to compare, and you can view them here: https://imgur.com/a/sk2TYQZ (please note that I was using less threads here, which is why the average "normal" speed is ~2K rq/s instead of ~7K as mentioned before)
If you'd like to see more of it, here is the full record of the issue, in a video which lasts about 40 seconds: https://i.imgur.com/z27FlVP.mp4 - As you can see, after the RAM is freed up, its usage slowly goes up again, before the same process repeats every ~3 minutes.
For more context, here is the method I am using to send the HTTP requests (it is being called from a lot of threads concurrently, as my app is multi-threaded in order to be super fast):
public static async Task<bool> HasChangedAsync(string endpoint, HttpClient httpClient)
{
const string baseAddress = "https://example.com/";
string response = await httpClient.GetStringAsync(baseAddress + endpoint);
return response.Contains("example");
}
One thing I did is I tried replacing the whole method by await Task.Delay(25) then return false, and that fixed the issue, RAM usage was barely increasing.
This lead me to believe the issue is HttpClient / my HTTP requests, and even though I tried replacing the GetStringAsync method by GetAsync using both a HttpRequestMessage and HttpResponseMessage (and disposing them with using), the behavior ended up being the exact same.
So here I am, desperate for a fix, and without enough knowledge about memory, garbage collector etc (if that's even needed here) to be able to fix this myself.
Please, Stack Overflow, do you have any idea?
Thanks a lot.
Your best bet would be to stream the response and then use chunks of it to find what your are looking for. An example implementation could be something as follows:
using var response = await Client.GetAsync(BaseUrl, HttpCompletionOption.ResponseHeadersRead);
await using var stream = await response.Content.ReadAsStreamAsync();
using var reader = new StreamReader(stream);
string line = null;
while ((line = await reader.ReadLineAsync()) != null)
{
if(line.Contains("example"))// do whatever
}

How to crawl XML(s) very fast — considering the below networking limitations?

I have a .Net crawler that's running when the user makes a request (so, it needs to be fast). It crawls 400+ links in real time. (This is the business ask.)
The problem: I need to detect if a link is xml (think of rss or atom feeds) or html. If the link is xml then I continue with processing, but if the link is html I can skip it. Usually, I have 2 xml(s) and 398+ html(s). Currently, I have multiple threads going but the processing is still slow, usually 75 seconds running with 10 threads for 400+ links, or 280 seconds running with 1 thread. (I want to add more threads but see below..)
The challenge that I am facing is that I read the streams as follows:
var request = WebRequest.Create(requestUriString: uri.AbsoluteUri);
// ....
var response = await request.GetResponseAsync();
//....
using (var reader = new StreamReader(stream: response.GetResponseStream(), encoding: encoding)) {
char[] buffer = new char[1024];
await reader.ReadAsync(buffer: buffer, index: 0, count: 1024);
responseText = new string(value: buffer);
}
// parse first byts of reasponseText to check if xml
The problem is that my optimization to get only 1024 is quite useless because the GetResponseAsync is downloading the entire stream anyway, as I see.
(The other option that I have is to look for the header ContentType, but that's quite similar AFAIK because I get the content anyway - in case that you don't recommend to use OPTIONS, that I did not use so far - and in addition xml might be content-type incorrectly marked (?) and I am going to miss some content.)
If there is any optimization that I am missing please help, as I am running out of ideas.
(I do consider to optimize this design by spreading the load on multiple servers, so that I balance the network with the parallelism, but that's a bit of change from the current architecture, that I cannot afford to do at this point in time.)
Using HEAD requests could speed up the requests significantly, IF you can rely on the Content-Type.
e.g
HttpClient client = new HttpClient();
HttpResponseMessage response = await client.SendAsync(new HttpRequestMessage() { Method = HttpMethod.Head});
Just showing basic usage. Obviously you need to add uri and anything else required to the request.
Also just to note that even with 10 threads, 400 request will likely always take quite a while. 400/10 means 40 requests sequentially. Unless the requests are to servers close by then 200ms would be a good response time meaning a minimum of 8 seconds. Ovserseas serves that may be slow could easily push this out to 30-40 seconds of unavoidable delay, unless you increase the amount of threads to parallel more of the requests.
Dataflow (Task Parallel Library) Can be very helpful for writing parallel pipes with a convenient MaxDegreeOfParallelism property for easily adjusting how many parallel instances can be run.

C# batch processing of async web responses hangs just before finishing

Here is the scenario.
I want to call 2 versions of an API (hosted on different servers), then cast their responses (they come as a JSON) to C# objects and compare them.
An important note here is that i need to query the APIs a lot of times ~3000. The reason for this is that I query an endpoint that has an id and that returns a specific object from the DB. So my queries are like http://myapi/v1/endpoint/id. And I basically use a loop to go through all of the ids.
Here is the issue
I start querying the API and for the first 90% of all requests it is blazing fast (I get the response and i process it) and all that happens under 5 seconds.
Then however, I start to come to a stop. The next 50-100 requests can take between 1 - 5 seconds to process and after that I come to a stop. No CPU-usage, network activity is low (and I am pretty sure that activity is from other apps). And my app just hangs.
UPDATE: Around 50% of the times I tested this, it does finally resume after quite a bit of time. But the other 50% it still just hangs
Here is what I am doing in-code
I have a list of Ids that I iterate to query the endpoint.
This is the main piece of code that queries the APIs and processes the responses.
var endPointIds = await GetIds(); // this queries a different endpoint to get all ids, however there are no issues with it
var tasks = endPointIds.Select(async id =>
{
var response1 = await _data.GetData($"{Consts.ApiEndpoint1}/{id}");
var response2 = await _data.GetData($"{Consts.ApiEndpoint2}/{id}");
return ProcessResponces(response1, response2);
});
var res = await Task.WhenAll(tasks);
var result = res.Where(r => r != null).ToList();
return result; // I never get to return the result, the app hangs before this is reached
This is the GetData() method
private async Task<string> GetAsync(string serviceUri)
{
try
{
var request = WebRequest.CreateHttp(serviceUri);
request.ContentType = "application/json";
request.Method = WebRequestMethods.Http.Get;
using (var response = await request.GetResponseAsync())
using (var responseStream = response.GetResponseStream())
using (var streamReader = new StreamReader(responseStream, Encoding.UTF8))
{
return await streamReader.ReadToEndAsync();
}
}
catch
{
return string.Empty;
}
}
I would link the ProcessResponces method as well, however I tried mocking it to return a string like so:
private string ProcessResponces(string responseJson1, string responseJson1)
{
//usually i would have 2 lines that deserialize responseJson1 and responseJson1 here using Newtonsoft.Json's DeserializeObject<>
return "Fake success";
}
And even with this implementation my issue did not go away (only difference it made is that I managed the have fast requests for like 97% of my requests, but my code still ended up stopping at the last few request), so I am guessing the main issue is not related to that method. But what it more or less does is deserialize both responses to c# objects, compares them and returns information about their equality.
Here are my observations after 4 hours of debugging
If I manually reduce the number of queries to my API (I used .Take() method on the list of ids) the issue still persists. For example on 1000 total requests I start hanging around 900th, for 1500 on the 1400th an so on. I believe the issue goes away at around 100-200 requests, but I am not sure since it might just be too fast for me to notice.
Since this is currently a console app I tried adding WriteLines() in some of my methods, and the issue seemed to go away (I am guessing the delay in speed that writing on the console creates, gives some time between requests and that helps)
Lastly i did a concurrency profiling of my app and it reported that there were a lot of contentions happening at the point where my app hangs. Opening the contention tab showed that they are mainly happening with System.IO.StreamReader.ReadToEndAsync()
Thoughts and Questions
Obviously, what can I do to resolve the issue?
Is my GetAsync() method wrong, should I be using something else instead of responseStream and streamReader?
I am not super proficient in asynchronous operations, maybe my flow of async/await operations is wrong.
Lastly, could it be something with the API controllers themselves? They are standard ASP.NET MVC 5 WebAPI controllers (version 5.2.3.0)
After long hours of tracking my requests with Fiddler and finally mocking my DataProvider (_data) to retrieve locally, from disk - it turns out that I had responses that were taking 30s+ to come (or even not coming at all).
Since my .Select() is async it always dispalyed info for the quick responses first, and then came to a halt as it was waiting for the slow ones. This gave an illusion that I was somehow loading the first X amount of requests quickly and then stopping. When, in reality, I was simply shown the fastest X amount of requests and then coming to a halt as I was waiting for the slow ones.
And to kind of answer my questions...
What can I do to resolve the issue - set a timeout that allows a maximum number of milliseconds/seconds for a request to finish.
The GetAsync() method is alright.
Async/await operations are also correct, just need to have in mind that doign an async select will return results ordered by the time it took for them to finish.
The ASP.NET Framework controllers are perfectly fine and do not contribute to the issue.

Factors limiting concurrent outstanding requests

I am trying to achieve a high number of webrequests per second.
With C#, I used multiple threads to send webrequest and find that no matter how many threads I created,
the max number of webrequest is around 70 per second in the condition that a server responds quickly.
I tried to simulate timeout response using fiddler in order to make concurrent outstanding web requests to have a better understanding.
With whatever amount of threads, there are instantly fired 2x requests, afterward, the queued requests fired one by one very slowly although the previous requests were still getting response. Once there were finished requests, the queued requests fired faster to replenish the amount. Its like it takes time to initialize once the pre-initialized amount is reached. Moreover, the response is small enough that bandwidth problem could be neglected.
Below is the code.
I tried in window xp and window 7 in different network. Same thing happens.
public Form1()
{
System.Net.ServicePointManager.DefaultConnectionLimit = 1000;
for (int i = 0; i < 80; i++)
{
int copy = i;
new Thread(() =>
{
submit_test(copy);
}) { IsBackground = true }.Start();
}
}
public void submit_test(int pos)
{
webRequest = (HttpWebRequest)WebRequest.Create("http://www.test.com/");
webRequest.Method = "GET";
using (HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse())
{
}
}
Is it the network card limiting the instantly fired amount?
I know that a large server can handle thousands of incoming request concurrently. Isn't it the same as sending out requests ( Establishing connection )?
Please tell me if using a server helps solve the problem.
Update clue:
1) I suspect if the router limiting and unplugged it. No difference.
2) Fiddler show that one queued requests fired exactly every second
3) I used apache benchmarking tool to try to send concurrent timeout request and same thing happens.Not likely to be .Net problem.
4) I try to connect to localhost instead. No difference
5) I used begingetresponse instead and no difference.
6) I suspect if this is fiddler problem. I use wireshark as well to capture traffic. Sensibly, the held outgoing requests are emulated by fiddler and the response was received in fact.
There are not outstanding requests actually. It seems that it is fiddler queuing the requests. I will edit/close the question after I find a better method to test
I had been stuck in this problem for a few days already. Please tell me any single possibility if you could think of.
Finally, I find that my test is not accurate due to an implementation of fiddler. The requests are queued after 2X outstanding requests for unknown reason.
I set up a server and limit its bandwidth to simulate timeout response.
Using wireshark, I can see that 150 SYN could be sent in around 1.4s as soon as my threads are ready.
There is a lot of overhead associated with creating Threads directly. Try using Task factory instead of Thread. Tasks use ThreadPool under the covers, which reuses threads instead of continuously creating them.
for (int i = 0; i < 80; i++)
{
int copy = i;
Task.Factory.StartNew(() =>
{
submit_test(copy);
});
}
Check out this other post on the topic:
Why so much difference in performance between Thread and Task?

How to verify if Parallel.Foreach is working correctly

Inside a Parallel Foreach I'm calling a service that gives me all information about an article, If I try to take the information about only one it takes 6 seconds.
My questions about that:
If I want to take the information about 4 articles, how long will it take? +- 6 seconds??
Actually doing that it takes 27 seconds, There's an easy way to check if it's working on parallel ??
Working with C# MVC3
Code:
private void PopulateArticleDictionary()
{
List<Article> tmpArticleFirstLevel = new List<Article>();
Parallel.ForEach<Article>(ArticlesFirstLevel,
article =>
{
var articleInDepth = ArticleService.SearchByCode(article.Code, article.Code18, article.Quantity, "ES", "EUR");
if (articleInDepth == null)
{
tmpArticleFirstLevel.Add(article);
}
else
{
tmpArticleFirstLevel.Add(articleInDepth);
}
}
);
ArticlesFirstLevel = tmpArticleFirstLevel;
}
Thank's !
How many cores do you have? The Parallel libraries won't spin up extra threads if you don't have the extra processors to take advantage of it. Also there is some overhead, doing parallel on 4 articles probably isn't really worth it unless you have a lot of crunching going on.
Also are you sure your service isn't a bottle neck?
Try this and see if there is an improvement
System.Net.ServicePointManager.DefaultConnectionLimit = 1000;
You could have a bottleneck connecting to your service as .NET out of the box only allows 2 concurrent HTTP pipes to the same host.

Categories