I am trying to achieve a high number of webrequests per second.
With C#, I used multiple threads to send webrequest and find that no matter how many threads I created,
the max number of webrequest is around 70 per second in the condition that a server responds quickly.
I tried to simulate timeout response using fiddler in order to make concurrent outstanding web requests to have a better understanding.
With whatever amount of threads, there are instantly fired 2x requests, afterward, the queued requests fired one by one very slowly although the previous requests were still getting response. Once there were finished requests, the queued requests fired faster to replenish the amount. Its like it takes time to initialize once the pre-initialized amount is reached. Moreover, the response is small enough that bandwidth problem could be neglected.
Below is the code.
I tried in window xp and window 7 in different network. Same thing happens.
public Form1()
{
System.Net.ServicePointManager.DefaultConnectionLimit = 1000;
for (int i = 0; i < 80; i++)
{
int copy = i;
new Thread(() =>
{
submit_test(copy);
}) { IsBackground = true }.Start();
}
}
public void submit_test(int pos)
{
webRequest = (HttpWebRequest)WebRequest.Create("http://www.test.com/");
webRequest.Method = "GET";
using (HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse())
{
}
}
Is it the network card limiting the instantly fired amount?
I know that a large server can handle thousands of incoming request concurrently. Isn't it the same as sending out requests ( Establishing connection )?
Please tell me if using a server helps solve the problem.
Update clue:
1) I suspect if the router limiting and unplugged it. No difference.
2) Fiddler show that one queued requests fired exactly every second
3) I used apache benchmarking tool to try to send concurrent timeout request and same thing happens.Not likely to be .Net problem.
4) I try to connect to localhost instead. No difference
5) I used begingetresponse instead and no difference.
6) I suspect if this is fiddler problem. I use wireshark as well to capture traffic. Sensibly, the held outgoing requests are emulated by fiddler and the response was received in fact.
There are not outstanding requests actually. It seems that it is fiddler queuing the requests. I will edit/close the question after I find a better method to test
I had been stuck in this problem for a few days already. Please tell me any single possibility if you could think of.
Finally, I find that my test is not accurate due to an implementation of fiddler. The requests are queued after 2X outstanding requests for unknown reason.
I set up a server and limit its bandwidth to simulate timeout response.
Using wireshark, I can see that 150 SYN could be sent in around 1.4s as soon as my threads are ready.
There is a lot of overhead associated with creating Threads directly. Try using Task factory instead of Thread. Tasks use ThreadPool under the covers, which reuses threads instead of continuously creating them.
for (int i = 0; i < 80; i++)
{
int copy = i;
Task.Factory.StartNew(() =>
{
submit_test(copy);
});
}
Check out this other post on the topic:
Why so much difference in performance between Thread and Task?
Related
So basically I am running a program which is able to send up to 7,000 HTTP requests every second in average, 24/7, in order to detect last changes on a website as quickly as possible.
However, every 2.5 to 3 minutes in average, my program slowdowns for around 10-15 seconds and goes from ~7K rq/s to less than 1000.
Here are logs from my program, where you can see the amount of requests it sends every second:
https://pastebin.com/029VLxZG
When scrolling down through the logs, you can see it goes slower every ~3 minutes. Example: https://i.imgur.com/US0wPzm.jpeg
At first I thought it was my server's ethernet connection going in a temporary "restricted" mode, and I even tried contacting my host about it. But then I ran 2 instances of my program simulteanously just to see what would happen and I noticed that, even though the issue (downtime) was happening on both, it wasn't always happening at the same time (depending on when the program was started, if you get what I mean), which meant the problem wasn't coming from the internet connection, but my program itself.
I investigated a little bit more, and found out that as soon as my program goes from ~7K rq/s to ~700, a lot of RAM is being freed up on my server.
I have taken 2 screenshots of the consecutive seconds before and once the downtime occurs (including RAM metrics), to compare, and you can view them here: https://imgur.com/a/sk2TYQZ (please note that I was using less threads here, which is why the average "normal" speed is ~2K rq/s instead of ~7K as mentioned before)
If you'd like to see more of it, here is the full record of the issue, in a video which lasts about 40 seconds: https://i.imgur.com/z27FlVP.mp4 - As you can see, after the RAM is freed up, its usage slowly goes up again, before the same process repeats every ~3 minutes.
For more context, here is the method I am using to send the HTTP requests (it is being called from a lot of threads concurrently, as my app is multi-threaded in order to be super fast):
public static async Task<bool> HasChangedAsync(string endpoint, HttpClient httpClient)
{
const string baseAddress = "https://example.com/";
string response = await httpClient.GetStringAsync(baseAddress + endpoint);
return response.Contains("example");
}
One thing I did is I tried replacing the whole method by await Task.Delay(25) then return false, and that fixed the issue, RAM usage was barely increasing.
This lead me to believe the issue is HttpClient / my HTTP requests, and even though I tried replacing the GetStringAsync method by GetAsync using both a HttpRequestMessage and HttpResponseMessage (and disposing them with using), the behavior ended up being the exact same.
So here I am, desperate for a fix, and without enough knowledge about memory, garbage collector etc (if that's even needed here) to be able to fix this myself.
Please, Stack Overflow, do you have any idea?
Thanks a lot.
Your best bet would be to stream the response and then use chunks of it to find what your are looking for. An example implementation could be something as follows:
using var response = await Client.GetAsync(BaseUrl, HttpCompletionOption.ResponseHeadersRead);
await using var stream = await response.Content.ReadAsStreamAsync();
using var reader = new StreamReader(stream);
string line = null;
while ((line = await reader.ReadLineAsync()) != null)
{
if(line.Contains("example"))// do whatever
}
I have a .Net crawler that's running when the user makes a request (so, it needs to be fast). It crawls 400+ links in real time. (This is the business ask.)
The problem: I need to detect if a link is xml (think of rss or atom feeds) or html. If the link is xml then I continue with processing, but if the link is html I can skip it. Usually, I have 2 xml(s) and 398+ html(s). Currently, I have multiple threads going but the processing is still slow, usually 75 seconds running with 10 threads for 400+ links, or 280 seconds running with 1 thread. (I want to add more threads but see below..)
The challenge that I am facing is that I read the streams as follows:
var request = WebRequest.Create(requestUriString: uri.AbsoluteUri);
// ....
var response = await request.GetResponseAsync();
//....
using (var reader = new StreamReader(stream: response.GetResponseStream(), encoding: encoding)) {
char[] buffer = new char[1024];
await reader.ReadAsync(buffer: buffer, index: 0, count: 1024);
responseText = new string(value: buffer);
}
// parse first byts of reasponseText to check if xml
The problem is that my optimization to get only 1024 is quite useless because the GetResponseAsync is downloading the entire stream anyway, as I see.
(The other option that I have is to look for the header ContentType, but that's quite similar AFAIK because I get the content anyway - in case that you don't recommend to use OPTIONS, that I did not use so far - and in addition xml might be content-type incorrectly marked (?) and I am going to miss some content.)
If there is any optimization that I am missing please help, as I am running out of ideas.
(I do consider to optimize this design by spreading the load on multiple servers, so that I balance the network with the parallelism, but that's a bit of change from the current architecture, that I cannot afford to do at this point in time.)
Using HEAD requests could speed up the requests significantly, IF you can rely on the Content-Type.
e.g
HttpClient client = new HttpClient();
HttpResponseMessage response = await client.SendAsync(new HttpRequestMessage() { Method = HttpMethod.Head});
Just showing basic usage. Obviously you need to add uri and anything else required to the request.
Also just to note that even with 10 threads, 400 request will likely always take quite a while. 400/10 means 40 requests sequentially. Unless the requests are to servers close by then 200ms would be a good response time meaning a minimum of 8 seconds. Ovserseas serves that may be slow could easily push this out to 30-40 seconds of unavoidable delay, unless you increase the amount of threads to parallel more of the requests.
Dataflow (Task Parallel Library) Can be very helpful for writing parallel pipes with a convenient MaxDegreeOfParallelism property for easily adjusting how many parallel instances can be run.
I have HTTP client which basically invokes multiple web requests against HTTP server. And I execute each HTTP request in a thread pool thread (synchronous call), and by default uses 30 TCP (using httpwebrequest.servicepoint - http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.servicepoint.aspx ). And based on the system i am managing, there can be ~500/1000 thread pool threads waiting for I/O (http response)
Now, I am wondering do I need to limit the number of threads I use as well? (for ex, http://msdn.microsoft.com/en-us/library/ee789351(v=vs.110).aspx System.Threading.Tasks - Limit the number of concurrent Tasks )
EDIT
Yes, I think I need to, limit the number of threads I use as even though these threads are in wait state they take up resources. This way I can control number of resources/threads I use up which makes it easier for my component to be integrated with others without causing them for starvation/contention for resource/threads.
EDIT 2
I have decided to completely embrace async model so that i won't be using thread pool threads to execute http requests, rather I can simply rely on "collaboration of OS Kernel and I/O completion port thread(s)" which will ensure that upon completion response will be sent in a callback (this way i can best use of cpu as well as resource). I am currently thinking of using (webclient.uploaddatataskasync) http://msdn.microsoft.com/en-us/library/system.net.webclient.uploaddatataskasync(v=vs.110).aspx, and update the code accordingly. (couple of references for details: HttpWebRequest and I/O completion ports, How does .NET make use of IO Threads or IO Completion Ports? )
EDIT 3
Basically i have used "async network I/O .net APIs as mentioned above" which essentially removed usage of my parallel library. For details, please see the below answer (i have added it for convenience, just in case if anyone is interested!).
psuedo code to give an idea how I am invoking web requests using webclient
//psudeo code to represents there can be varibale number of requests
//these can be ~500 to ~1000
foreach(var request in requests)
{
//psudeo code which basically executes webrequest in threadpool thread
//MY QUESTION: Is it OK to create as many worker threads as number rrequests
//and simply let them wait on a semaphore, on should i limit the concurrency?
MyThreadPoolConcurrentLibrary.ExedcuteAction(() =>
{
var sem = new Semaphore(initialCount: 50, maximumCount: 50.Value);
try
{
//using semaphore as the HTTP Server which i am taking to recommend
//to send '50' parallel requests in '30' TCP Connections
sem.WaitOne();
//using my custom webclient, so that i can configure 'tcp' connections
//(servicepoint connection limit) and ssl validation etc.
using (MyCustomWebClient client = new MyCustomWebClient())
{
//http://msdn.microsoft.com/en-us/library/tdbbwh0a(v=vs.110).aspx
//basically the worker thread simply waits here
client.UploadData(address: "urladdress", data: bytesdata);
}
}
finally
{
sem.Release(1);
}
});
}
MyThreadPoolConcurrentLibrary.WaitAll(/*...*/);
Basically should I do something to limit the number of threads I consume, or let the thread pool take care of it (i.e. in case if my app reaches thread pool's maximum thread limit, it any way queues the request - so I can simply rely on it)
*pseudo code which should show my custom webclient where I configure tcp connections, ssl validation etc.
class MyCustomWebClient : WebClient
{
protected override WebRequest GetWebRequest(Uri address)
{
HttpWebRequest request = (HttpWebRequest)base.GetWebRequest(address);
request.KeepAlive = true;
request.Timeout = 300;
request.ServicePoint.ConnectionLimit = TCPConnectionsLimit;
request.ServerCertificateValidationCallback = this.ServerCertificateValidationCallback;
return request;
}
private bool ServerCertificateValidationCallback(object sender, System.Security.Cryptography.X509Certificates.X509Certificate certificate, System.Security.Cryptography.X509Certificates.X509Chain chain, System.Net.Security.SslPolicyErrors sslPolicyErrors)
{
throw new NotImplementedException();
}
}
Best Regards.
Since I am performing network I/O (http web requests), it is not good idea use 'synchronous' httpwebrequests and let the thread pool threads to block in sync calls. So, i have used 'async network i/o operations (web client's async task methods) as mentioned above in the question as per the suggestions from comments. It automatically removed usage of number of threads in my component - for details, please see below psudeo code snippet...
Here are some useful links that helped me to adapt to few of c# 5.0 async concepts easily (async/await):
Deep Dive Video (good explanation of async/await state machine) http://channel9.msdn.com/events/TechDays/Techdays-2014-the-Netherlands/Async-programming-deep-dive
http://blog.stephencleary.com/2013/11/there-is-no-thread.html
async/await error handling: http://www.interact-sw.co.uk/iangblog/2010/11/01/csharp5-async-exceptions , http://msdn.microsoft.com/en-us/library/0yd65esw.aspx , How to better understand the code/statements from "Async - Handling multiple Exceptions" article?
Nice book: http://www.amazon.com/Asynchronous-Programming-NET-Richard-Blewett/dp/1430259205
class Program
{
static SemaphoreSlim s_sem = new SemaphoreSlim(90, 90);
static List<Task> s_tasks = new List<Task>();
public static void Main()
{
for (int request = 1; request <= 1000; request++)
{
var task = FetchData();
s_tasks.Add(task);
}
Task.WaitAll(s_tasks.ToArray());
}
private static async Task<string> FetchData()
{
try
{
s_sem.Wait();
using (var wc = new MyCustomWebClient())
{
string content = await wc.DownloadStringTaskAsync(
new Uri("http://www.interact-sw.co.uk/oops/")).ConfigureAwait(continueOnCapturedContext: false);
return content;
}
}
finally
{
s_sem.Release(1);
}
}
private class MyCustomWebClient : WebClient
{
protected override WebRequest GetWebRequest(Uri address)
{
var req = (HttpWebRequest)base.GetWebRequest(address);
req.ServicePoint.ConnectionLimit = 30;
return req;
}
}
}
Regards.
You could always simply aim for the same limit that browsers run under. That way the server admins can't really hate on you too much.
Now, the RFC says that you should limit connections to 2 pr domain, but according to
http://www.stevesouders.com/blog/2008/03/20/roundup-on-parallel-connections/
many browsers go as high as 6 or 8 parallel connections (and this was in 2008).
Browser HTTP/1.1 HTTP/1.0
IE 6,7 2 4
IE 8 6 6
Firefox 2 2 8
Firefox 3 6 6
Safari 3,4 4 4
Chrome 1,2 6 ?
Chrome 3 4 4
Chrome 4+ 6 ?
iPhone 2 4 ?
iPhone 3 6 ?
iPhone 4 4 ?
Opera 9.63, 4 4
Opera 10.51+ 8 ?
I believe after lengthy research and searching, I have discovered that what I want to do is probably better served by setting up an asynchronous connection and terminating it after the desired timeout... But I will go ahead and ask anyway!
Quick snippet of code:
HttpWebRequest webReq = (HttpWebRequest)HttpWebRequest.Create(url);
webReq.Timeout = 5000;
HttpWebResponse response = (HttpWebResponse)webReq.GetResponse();
// this takes ~20+ sec on servers that aren't on the proper port, etc.
I have an HttpWebRequest method that is in a multi-threaded application, in which I am connecting to a large number of company web servers. In cases where the server is not responding, the HttpWebRequest.GetResponse() is taking about 20 seconds to time out, even though I have specified a timeout of only 5 seconds. In the interest of getting through the servers on a regular interval, I want to skip those taking longer than 5 seconds to connect to.
So the question is: "Is there a simple way to specify/decrease a connection timeout for a WebRequest or HttpWebRequest?"
I believe that the problem is that the WebRequest measures the time only after the request is actually made. If you submit multiple requests to the same address then the ServicePointManager will throttle your requests and only actually submit as many concurrent connections as the value of the corresponding ServicePoint.ConnectionLimit which by default gets the value from ServicePointManager.DefaultConnectionLimit. Application CLR host sets this to 2, ASP host to 10. So if you have a multithreaded application that submits multiple requests to the same host only two are actually placed on the wire, the rest are queued up.
I have not researched this to a conclusive evidence whether this is what really happens, but on a similar project I had things were horrible until I removed the ServicePoint limitation.
Another factor to consider is the DNS lookup time. Again, is my belief not backed by hard evidence, but I think the WebRequest does not count the DNS lookup time against the request timeout. DNS lookup time can show up as very big time factor on some deployments.
And yes, you must code your app around the WebRequest.BeginGetRequestStream (for POSTs with content) and WebRequest.BeginGetResponse (for GETs and POSTSs). Synchronous calls will not scale (I won't enter into details why, but that I do have hard evidence for). Anyway, the ServicePoint issue is orthogonal to this: the queueing behavior happens with async calls too.
Sorry for tacking on to an old thread, but I think something that was said above may be incorrect/misleading.
From what I can tell .Timeout is NOT the connection time, it is the TOTAL time allowed for the entire life of the HttpWebRequest and response. Proof:
I Set:
.Timeout=5000
.ReadWriteTimeout=32000
The connect and post time for the HttpWebRequest took 26ms
but the subsequent call HttpWebRequest.GetResponse() timed out in 4974ms thus proving that the 5000ms was the time limit for the whole send request/get response set of calls.
I didn't verify if the DNS name resolution was measured as part of the time as this is irrelevant to me since none of this works the way I really need it to work--my intention was to time out quicker when connecting to systems that weren't accepting connections as shown by them failing during the connect phase of the request.
For example: I'm willing to wait 30 seconds on a connection request that has a chance of returning a result, but I only want to burn 10 seconds waiting to send a request to a host that is misbehaving.
Something I found later which helped, is the .ReadWriteTimeout property. This, in addition to the .Timeout property seemed to finally cut down on the time threads would spend trying to download from a problematic server. The default time for .ReadWriteTimeout is 5 minutes, which for my application was far too long.
So, it seems to me:
.Timeout = time spent trying to establish a connection (not including lookup time)
.ReadWriteTimeout = time spent trying to read or write data after connection established
More info: HttpWebRequest.ReadWriteTimeout Property
Edit:
Per #KyleM's comment, the Timeout property is for the entire connection attempt, and reading up on it at MSDN shows:
Timeout is the number of milliseconds that a subsequent synchronous request made with the GetResponse method waits for a response, and the GetRequestStream method waits for a stream. The Timeout applies to the entire request and response, not individually to the GetRequestStream and GetResponse method calls. If the resource is not returned within the time-out period, the request throws a WebException with the Status property set to WebExceptionStatus.Timeout.
(Emphasis mine.)
From the documentation of the HttpWebRequest.Timeout property:
A Domain Name System (DNS) query may
take up to 15 seconds to return or
time out. If your request contains a
host name that requires resolution and
you set Timeout to a value less than
15 seconds, it may take 15 seconds or
more before a WebException is thrown
to indicate a timeout on your request.
Is it possible that your DNS query is the cause of the timeout?
No matter what we tried we couldn't manage to get the timeout below 21 seconds when the server we were checking was down.
To work around this we combined a TcpClient check to see if the domain was alive followed by a separate check to see if the URL was active
public static bool IsUrlAlive(string aUrl, int aTimeoutSeconds)
{
try
{
//check the domain first
if (IsDomainAlive(new Uri(aUrl).Host, aTimeoutSeconds))
{
//only now check the url itself
var request = System.Net.WebRequest.Create(aUrl);
request.Method = "HEAD";
request.Timeout = aTimeoutSeconds * 1000;
var response = (HttpWebResponse)request.GetResponse();
return response.StatusCode == HttpStatusCode.OK;
}
}
catch
{
}
return false;
}
private static bool IsDomainAlive(string aDomain, int aTimeoutSeconds)
{
try
{
using (TcpClient client = new TcpClient())
{
var result = client.BeginConnect(aDomain, 80, null, null);
var success = result.AsyncWaitHandle.WaitOne(TimeSpan.FromSeconds(aTimeoutSeconds));
if (!success)
{
return false;
}
// we have connected
client.EndConnect(result);
return true;
}
}
catch
{
}
return false;
}
What is a reasonable amount of time to wait for a web request to return? I know this is maybe a little loaded as a question, but all I am trying to do is verify if a web page is available.
Maybe there is a better way?
try
{
// Create the web request
HttpWebRequest request = WebRequest.Create(this.getUri()) as HttpWebRequest;
request.Credentials = System.Net.CredentialCache.DefaultCredentials;
// 2 minutes for timeout
request.Timeout = 120 * 1000;
if (request != null)
{
// Get response
response = request.GetResponse() as HttpWebResponse;
connectedToUrl = processResponseCode(response);
}
else
{
logger.Fatal(getFatalMessage());
string error = string.Empty;
}
}
catch (WebException we)
{
...
}
catch (Exception e)
{
...
}
You need to consider how long the consumer of the web service is going to take e.g. if you are connecting to a DB web server and you run a lengthy query, you need to make the web service timeout longer then the time the query will take. Otherwise, the web service will (erroneously) time out.
I also use something like (consumer time) + 10 seconds.
Offhand I'd allow 10 seconds, but it really depends on what kind of network connection the code will be running with. Try running some test pings over a period of a few days/weeks to see what the typical response time is.
I would measure how long it takes for pages that do exist to respond. If they all respond in about the same amount of time, then I would set the timeout period to approximately double that amount.
Just wanted to add that a lot of the time I'll use an adaptive timeout. Could be a simple metric like:
period += (numTimeouts/numRequests > .01 ? someConstant: 0);
checked whenever you hit a timeout to try and keep timeouts under 1% (for example). Just be careful about decrementing it too low :)
The reasonable amount of time to wait for a web request may differ from one server to the next. If a server is at the far end of a high-delay link then clearly it will take longer to respond than when it is in the next room. But two minutes seems like it's more than ample time for a server to respond. The default timeout value for the PING command is expressed in seconds, not minutes. I suggest you look into the timeout values that are used by networking utilities like PING or TRACERT for inspiration.
I guess this depends on two things:
network speed/load (as others wrote, using ping might give you an idea about this)
the kind of page you are calling: e.g. is it a static HTML page or is it a page which might do some time-consuming operations (DB access, etc.)
Anyway, I think 2 minutes is a lot of time. I would definitely reduce the timeout to less than 30 seconds.
I realize this doesn't directly answer your question, but then an "answer" to this question is a little tough. Anyway, a tool I've used gomez in the past to measure page load times from various parts of the world. It's free and if you haven't done this kind of testing before it might be helpful in terms of giving you a firm idea of what typical page load times are for a given page from a given location.
I would only wait (MAX) 30 seconds probably closer to 15. It really depends on what you are doing and what the result is of unsuccessful connection. As I am sure you know there is lots of reason why you could get a timeout...