I have a thread that runs periodically every 60 seconds. This thread is getting response from a web url. Everything is fine until the third run. It doesn't work anymore and shows this error :
"The operation has timed out"
This is my code and error found on line 5. Thanks!
string sURL;
sURL = "http://www.something.com";
WebRequest wrGETURL;
wrGETURL = WebRequest.Create(sURL);
HttpWebResponse http = (HttpWebResponse)wrGETURL.GetResponse();
Stream objStream = null;
objStream = http.GetResponseStream();
You might want to consider using the using statement:
string sURL;
sURL = "http://www.something.com";
using (WebRequest wrGETURL = WebRequest.Create(sURL))
{
using (HttpWebResponse http = (HttpWebResponse)wrGETURL.GetResponse())
{
Stream objStream = http.GetResponseStream();
//etc.
}
}
it guarantees that the Dispose method is called, even in case a exception occurs. (https://msdn.microsoft.com/en-us/library/yh598w02.aspx)
The reason for the timeout is probably that your server has a limit of x simultaneous requests. Due to the improper disposure, the connection will stay open longer then needed. And although the garbage collector will fix this for you, it's timing is often too late.
That's why I alway's recommend to call Dispose, through using for all objects that implements IDisposable. This is especially true when you use these object in loops or low-memory (low resource) systems.
Careful with the streams though, they tend to use a decorator pattern and might call Dispose on all its "child" objects.
Typically applies to:
Graphics objects
Database connections
TCP/IP (http etc.) connections
File system access
Code with native components, such as driver for usb, webcam's etc.
Stream objects
The magic number "3" is from here:
The maximum number of concurrent connections allowed by a ServicePoint object. The default connection limit is 10 for ASP.NET hosted applications and 2 for all others.
Related
When I perform a web request using HttpWebRequest in C#, I noticed the first call to an URL/domain takes slightly longer than subsequent ones. Slightly longer in this case means about 100-150 ms longer, i.e. overall time 150-200 ms instead of 50 ms.
I googled this and came across several users reporting such behaviour. However, in all of these cases there was a delay of several seconds and the problem seems to be related to the proxy settings. That is not the case in my situation.
From experimenting with the "Connection keep alive" header I deduced that it has something to do with the opening of an connection. When I use "keep alive", starting from the second request, the delay is normal. When I use "Connection close", all requests suffer from the described delay.
Here's the minimal code I use for reproducing this problem:
ServicePointManager.UseNagleAlgorithm = false;
ServicePointManager.Expect100Continue = false;
for (int i = 0; i < 3; ++i) {
var url = "https://www.google.de";
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(url);
req.KeepAlive = true;
req.ReadWriteTimeout = 1500;
req.Timeout = 1500;
req.ServerCertificateValidationCallback = delegate { return true; };
req.Proxy = null;
req.ProtocolVersion = HttpVersion.Version11;
var start = DateTime.Now;
var resp = req.GetResponse();
var end = DateTime.Now;
Console.WriteLine((end - start).TotalMilliseconds);
resp.Dispose();
}
This normally produces an output like this:
173.0381
57.3195
66.4853
One might be tempted to say establishing the connection simply takes that long. So I analyzed the traffic with the analyzation tool Fiddler. I added Console.WriteLine()-calls for the two variables start and end into the code. That gives:
Start 07.08.2020 21:27:49.225
End 07.08.2020 21:27:49.430
Now I look at what Fiddler reports:
ClientConnected: 21:27:49.237
ClientBeginRequest: 21:27:49.335
GotRequestHeaders: 21:27:49.336
ClientDoneRequest: 21:27:49.336
ServerConnected: 21:27:49.274
FiddlerBeginRequest: 21:27:49.337
ServerGotRequest: 21:27:49.337
ServerBeginResponse: 21:27:49.401
GotResponseHeaders: 21:27:49.402
ServerDoneResponse: 21:27:49.430
ClientBeginResponse: 21:27:49.430
ClientDoneResponse: 21:27:49.430
Overall Elapsed: 0:00:00.094
So despite being connected at 21:27:49.274 the request only starts about 50 ms later at 21:27:49.335.
Things I've tried include the common recommendations that were given on similar issues on stackoverflow and the web:
Set the proxy explicitly to null to prevent automatic search for system proxy
In Internet Explorer network settings disable "Automatic Detection of Settings"
Use another URL. In this example here I use Google so everyone can reproduce this, but I also tested it with an URL of my own web server and a simple PHP script just echoing the time.
Disabling the certificate check both via request specific req.ServerCertificateValidationCallback and the global ServicePointManager.ServerCertificateValidationCallback
Using a non SSL-URL. In this case the difference between the first and subsequent requests is still there, however it seems to be smaller.
Bypass the DNS lookup by providing the IP address in the HttpWebRequest.Create() call and later changing the Host-Property of the request object
Changing other ServicePointManager-related settings, i.e. disabling the Nagle algorithm and the "Excpect 100 Continue".
Use another computer. Use another Internet connection from a different provider. Use a VPN.
Use different versions of .NET. Normally I compile with Framework 4.8, but previous versions show the same bevaviour. I tried .NET Core also. That has an even worse overall performance and the first request is still consideraby slower than subsequent ones.
Use WebClient instead of HttpWebRequest
None of this resulted in a significant change of the behaviour, the first call is still slightly slower than all subsequent ones.
The one thing that did actually work was building the HTTPS-request on my own using TcpClient and SslStream. In this case, all requests have the same latency of about 50 ms for Google. For most cases this is probably not the best solution, I would prefer to use an integrated .NET class.
My questions are: Can you reproduce this? Might this be a .NET bug? Any more suggestions what I could try to prevent this?
I am trying to figure out whether there is a limit on the call to "GetRequestStream".
To test this, I created a load test with 10 agents and 1 controller all trying to do a post call (object size 10kb) and user count of 10. (code below). I didnt make call to "GetResponse()" to see if i can make large number of calls to GetRequestStream.
But what actually happened was even with all those agents the load test didnt cross 85 ~ 90 Requests per second and the contention point was this line in the code "using (Stream requestStream = request.GetRequestStream())".
I reduced the agents down to 2 but same result. The end point server is a single server box. When i used a VIP ( backed by 3 servers at the backend), i got 3 times the output i.e was able to go to 270 RPS.
When i increased the agents again to 15 against VIP, the avg still remained the same. So i concluded there are some shared resources in use during the call to GetRequestStream
Based on an earlier SO post HttpWebRequest.GetRequestStream : What it does?
It mentions that the call to GetRequestStream actually blocks some resource at the server end and irrespective of the number of post calls generated, it can only serve so many requests. I used to believe that GetRequestStream doesnt make any call but just gets a stream and writes the object to the stream the call happens when we call GetResponse()
HttpWebRequest request =
(HttpWebRequest)HttpWebRequest.Create("http://some.existing.url");
request.Method = "POST";
request.ContentType = "text/xml";
Byte[] documentBytes = GetDocumentBytes ();
using (Stream requestStream = request.GetRequestStream())
{
requestStream.Write(documentBytes, 0, documentBytes.Length);
requestStream.Flush();
requestStream.Close();
}
Would really appreciate if someone can point me to some resources where can i dig more detail about this or if someone can explain this behavior.
Per http 1.1 RFC, HttpWebRequest has a default connection limit of two from a client to a host – that might cause a drop in latency beyond two concurrent users. It can be changed through the app.config of the client process, see here and here.
I am developing an application using twitter api and that involves writing a method to check if a user exists. Here is my code:
public static bool checkUserExists(string user)
{
//string URL = "https://twitter.com/" + user.Trim();
//string URL = "http://api.twitter.com/1/users/show.xml?screen_name=" + user.Trim();
//string URL = "http://google.com/#hl=en&sclient=psy-ab&q=" + user.Trim();
string URL = "http://api.twitter.com/1/users/show.json?screen_name=" + user.Trim();
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(URL);
try
{
var webResponse = (HttpWebResponse)webRequest.GetResponse();
return true;
}
//this part onwards does not matter
catch (WebException ex)
{
if (ex.Status == WebExceptionStatus.ProtocolError && ex.Response != null)
{
var resp = (HttpWebResponse)ex.Response;
if (resp.StatusCode == HttpStatusCode.NotFound)
{
return false;
}
else
{
throw new Exception("Unknown level 1 Exception", ex);
}
}
else
{
throw new Exception("Unknown level 2 Exception", ex);
}
}
}
The problem is, calling the method does not work(it doesn't get a response) more than 2 or 3 times, using any of the urls that have been commented, including the google search query(I thought it might be due to twitter API limit). On debug, it shows that it's stuck at:
var webResponse = (HttpWebResponse)webRequest.GetResponse();
Here's how I am calling it:
Console.WriteLine(TwitterFollowers.checkUserExists("handle1"));
Console.WriteLine(TwitterFollowers.checkUserExists("handle2"));
Console.WriteLine(TwitterFollowers.checkUserExists("handle3"));
Console.WriteLine(TwitterFollowers.checkUserExists("handle4"));
Console.WriteLine(TwitterFollowers.checkUserExists("handle5"));
Console.WriteLine(TwitterFollowers.checkUserExists("handle6"));
At most I get 2-3 lines of output. Could someone please point out what's wrong?
Update 1:
I sent 1 request every 15 seconds (well within limit) and it still causes an error. on the other hand, sending a request, closing the app and running it again works very well (on average accounts to 1 request every 5 seconds). The rate limit is 150 calls per hour Twitter FAQ.
Also, I did wait for a while, and got this exception at level 2:
http://pastie.org/3897499
Update 2:
Might sound surprising but if I run fiddler, it works perfectly. Regardless of whether I target this process or not!
The effect you're seeing is almost certainly due to rate-limit type policies on the Twitter API (multiple requests in quick succession). They keep a tight watch on how you're using their API: the first step is to check their terms of use and policies on rate limiting, and make sure you're in compliance.
Two things jump out at me:
You're hitting the API with multiple requests in rapid succession. Most REST APIs, including Google search, are not going to allow you to do that. These APIs are very visible targets, and it makes sense that they'd be pro-active about preventing denial-of-service attacks.
You don't have a User Agent specified in your request. Most APIs require you to send them a meaningful UA, as a way of helping them identify you.
Note that you're dealing with unmanaged resources underneath your HttpWebResponse. So calling Dispose() in a timely fashion or
wrapping the object in a using statement is not only wise, but important to avoid blocking.
Also, var is great for dealing with anonymous types, Linq query
results, and such but it should not become a crutch. Why use var
when you're well aware of the type? (i.e. you're already performing
a cast to HttpWebResponse.)
Finally, services like this often limit the rate of connections per second and/or the number of simultaneous connections allowed to prevent abuse. By not disposing of your HttpWebResponse objects, you may be violating the permitted number of simultaneous connections. By querying too often you'd break the rate limit.
I have WPF app that processes a lot of urls (thousands), each it sends off to it's own thread, does some processing and stores a result in the database.
The urls can be anything, but some seem to be massively big pages, this seems to shoot the memory usage up a lot and make performance really bad. I set a timeout on the web request, so if it took longer than say 20 seconds it doesn't bother with that url, but it seems to not make much difference.
Here's the code section:
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(urlAddress.Address);
req.Timeout = 20000;
req.ReadWriteTimeout = 20000;
req.Method = "GET";
req.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
using (StreamReader reader = new StreamReader(req.GetResponse().GetResponseStream()))
{
pageSource = reader.ReadToEnd();
req = null;
}
It also seems to stall/ramp up memory on reader.ReadToEnd();
I would have thought having a cut off of 20 seconds would help, is there a better method? I assume there's not much advantage to using asynch web method as each url download is on its own thread anyway..
Thanks
In general, it's recommended that you use asynchronous HttpWebRequests instead of creating your own threads. The article I've linked above also includes some benchmarking results.
I don't know what you're doing with the page source after you read the stream to end, but using string can be an issue:
System.String type is used in any .NET application. We have strings
as: names, addresses, descriptions, error messages, warnings or even
application settings. Each application has to create, compare or
format string data. Considering the immutability and the fact that any
object can be converted to a string, all the available memory can be
swallowed by a huge amount of unwanted string duplicates or unclaimed
string objects.
Some other suggestions:
Do you have any firewall restrictions? I've seen a lot of issues at work where the firewall enables rate limiting and fetching pages grinds down to a halt (happens to me all the time)!
I presume that you're going to use the string to parse HTML, so I would recommend that you initialize your parser with the Stream instead of passing in a string containing the page source (if that's an option).
If you're storing the page source in the database, then there isn't much you can do.
Try to eliminate the reading of the page source as a potential contributor to the memory/performance problem by commenting it out.
Use a streaming HTML parser such as Majestic 12- avoids the need to load the entire page source into memory (again, if you need to parse)!
Limit the size of the pages you're going to download, say, only download 150KB. The average page size is about 100KB-130KB
Additionally, can you tell us what's your initial rate of fetching pages and what does it go down to? Are you seeing any errors/exceptions from the web request as you're fetching pages?
Update
In the comment section I noticed that you're creating thousands of threads and I would say that you don't need to do that. Start with a small number of threads and keep increasing them until you peek the performance on your system. Once you start adding threads and the performance looks like it's tapered off, then sop adding threads. I can't imagine that you will need more than 128 threads (even that seems high). Create a fixed number of threads, e.g. 64, let each thread take a URL from your queue, fetch the page, process it and then go back to getting pages from the queue again.
You could enumerate with a buffer instead of calling ReadToEnd, and if it is taking too long, then you could log and abandon - something like:
static void Main(string[] args)
{
Uri largeUri = new Uri("http://www.rfkbau.de/index.php?option=com_easybook&Itemid=22&startpage=7096");
DateTime start = DateTime.Now;
int timeoutSeconds = 10;
foreach (var s in ReadLargePage(largeUri))
{
if ((DateTime.Now - start).TotalSeconds > timeoutSeconds)
{
Console.WriteLine("Stopping - this is taking too long.");
break;
}
}
}
static IEnumerable<string> ReadLargePage(Uri uri)
{
int bufferSize = 8192;
int readCount;
Char[] readBuffer = new Char[bufferSize];
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
using (StreamReader stream = new StreamReader(response.GetResponseStream(), Encoding.UTF8))
{
readCount = stream.Read(readBuffer, 0, bufferSize);
while (readCount > 0)
{
yield return new string(readBuffer, 0, bufferSize);
readCount = stream.Read(readBuffer, 0, bufferSize);
}
}
}
Lirik has really good summary.
I would add that if I were implementing this, I would make a separate process that reads the pages. So, it would be a pipeline. First stage would download the URL and write it to a disk location. And then queue that file to the next stage. Next stage reads from the disk and does the parsing & DB updates. That way you will get max throughput on the download and parsing as well. You can also tune your threadpools so that you have more workers parsing, etc. This architecture also lends very well to distributed processing where you can have one machine downloading, and another host parsing/etc.
Another thing to note is that if you are hitting the same server from multiple threads (even if you are using Async) then you will hit yourself against the max outgoing connection limit. You can throttle yourself to stay below that, or increase the connection limit on the ServicePointManager class.
I believe after lengthy research and searching, I have discovered that what I want to do is probably better served by setting up an asynchronous connection and terminating it after the desired timeout... But I will go ahead and ask anyway!
Quick snippet of code:
HttpWebRequest webReq = (HttpWebRequest)HttpWebRequest.Create(url);
webReq.Timeout = 5000;
HttpWebResponse response = (HttpWebResponse)webReq.GetResponse();
// this takes ~20+ sec on servers that aren't on the proper port, etc.
I have an HttpWebRequest method that is in a multi-threaded application, in which I am connecting to a large number of company web servers. In cases where the server is not responding, the HttpWebRequest.GetResponse() is taking about 20 seconds to time out, even though I have specified a timeout of only 5 seconds. In the interest of getting through the servers on a regular interval, I want to skip those taking longer than 5 seconds to connect to.
So the question is: "Is there a simple way to specify/decrease a connection timeout for a WebRequest or HttpWebRequest?"
I believe that the problem is that the WebRequest measures the time only after the request is actually made. If you submit multiple requests to the same address then the ServicePointManager will throttle your requests and only actually submit as many concurrent connections as the value of the corresponding ServicePoint.ConnectionLimit which by default gets the value from ServicePointManager.DefaultConnectionLimit. Application CLR host sets this to 2, ASP host to 10. So if you have a multithreaded application that submits multiple requests to the same host only two are actually placed on the wire, the rest are queued up.
I have not researched this to a conclusive evidence whether this is what really happens, but on a similar project I had things were horrible until I removed the ServicePoint limitation.
Another factor to consider is the DNS lookup time. Again, is my belief not backed by hard evidence, but I think the WebRequest does not count the DNS lookup time against the request timeout. DNS lookup time can show up as very big time factor on some deployments.
And yes, you must code your app around the WebRequest.BeginGetRequestStream (for POSTs with content) and WebRequest.BeginGetResponse (for GETs and POSTSs). Synchronous calls will not scale (I won't enter into details why, but that I do have hard evidence for). Anyway, the ServicePoint issue is orthogonal to this: the queueing behavior happens with async calls too.
Sorry for tacking on to an old thread, but I think something that was said above may be incorrect/misleading.
From what I can tell .Timeout is NOT the connection time, it is the TOTAL time allowed for the entire life of the HttpWebRequest and response. Proof:
I Set:
.Timeout=5000
.ReadWriteTimeout=32000
The connect and post time for the HttpWebRequest took 26ms
but the subsequent call HttpWebRequest.GetResponse() timed out in 4974ms thus proving that the 5000ms was the time limit for the whole send request/get response set of calls.
I didn't verify if the DNS name resolution was measured as part of the time as this is irrelevant to me since none of this works the way I really need it to work--my intention was to time out quicker when connecting to systems that weren't accepting connections as shown by them failing during the connect phase of the request.
For example: I'm willing to wait 30 seconds on a connection request that has a chance of returning a result, but I only want to burn 10 seconds waiting to send a request to a host that is misbehaving.
Something I found later which helped, is the .ReadWriteTimeout property. This, in addition to the .Timeout property seemed to finally cut down on the time threads would spend trying to download from a problematic server. The default time for .ReadWriteTimeout is 5 minutes, which for my application was far too long.
So, it seems to me:
.Timeout = time spent trying to establish a connection (not including lookup time)
.ReadWriteTimeout = time spent trying to read or write data after connection established
More info: HttpWebRequest.ReadWriteTimeout Property
Edit:
Per #KyleM's comment, the Timeout property is for the entire connection attempt, and reading up on it at MSDN shows:
Timeout is the number of milliseconds that a subsequent synchronous request made with the GetResponse method waits for a response, and the GetRequestStream method waits for a stream. The Timeout applies to the entire request and response, not individually to the GetRequestStream and GetResponse method calls. If the resource is not returned within the time-out period, the request throws a WebException with the Status property set to WebExceptionStatus.Timeout.
(Emphasis mine.)
From the documentation of the HttpWebRequest.Timeout property:
A Domain Name System (DNS) query may
take up to 15 seconds to return or
time out. If your request contains a
host name that requires resolution and
you set Timeout to a value less than
15 seconds, it may take 15 seconds or
more before a WebException is thrown
to indicate a timeout on your request.
Is it possible that your DNS query is the cause of the timeout?
No matter what we tried we couldn't manage to get the timeout below 21 seconds when the server we were checking was down.
To work around this we combined a TcpClient check to see if the domain was alive followed by a separate check to see if the URL was active
public static bool IsUrlAlive(string aUrl, int aTimeoutSeconds)
{
try
{
//check the domain first
if (IsDomainAlive(new Uri(aUrl).Host, aTimeoutSeconds))
{
//only now check the url itself
var request = System.Net.WebRequest.Create(aUrl);
request.Method = "HEAD";
request.Timeout = aTimeoutSeconds * 1000;
var response = (HttpWebResponse)request.GetResponse();
return response.StatusCode == HttpStatusCode.OK;
}
}
catch
{
}
return false;
}
private static bool IsDomainAlive(string aDomain, int aTimeoutSeconds)
{
try
{
using (TcpClient client = new TcpClient())
{
var result = client.BeginConnect(aDomain, 80, null, null);
var success = result.AsyncWaitHandle.WaitOne(TimeSpan.FromSeconds(aTimeoutSeconds));
if (!success)
{
return false;
}
// we have connected
client.EndConnect(result);
return true;
}
}
catch
{
}
return false;
}