I am trying to figure out whether there is a limit on the call to "GetRequestStream".
To test this, I created a load test with 10 agents and 1 controller all trying to do a post call (object size 10kb) and user count of 10. (code below). I didnt make call to "GetResponse()" to see if i can make large number of calls to GetRequestStream.
But what actually happened was even with all those agents the load test didnt cross 85 ~ 90 Requests per second and the contention point was this line in the code "using (Stream requestStream = request.GetRequestStream())".
I reduced the agents down to 2 but same result. The end point server is a single server box. When i used a VIP ( backed by 3 servers at the backend), i got 3 times the output i.e was able to go to 270 RPS.
When i increased the agents again to 15 against VIP, the avg still remained the same. So i concluded there are some shared resources in use during the call to GetRequestStream
Based on an earlier SO post HttpWebRequest.GetRequestStream : What it does?
It mentions that the call to GetRequestStream actually blocks some resource at the server end and irrespective of the number of post calls generated, it can only serve so many requests. I used to believe that GetRequestStream doesnt make any call but just gets a stream and writes the object to the stream the call happens when we call GetResponse()
HttpWebRequest request =
(HttpWebRequest)HttpWebRequest.Create("http://some.existing.url");
request.Method = "POST";
request.ContentType = "text/xml";
Byte[] documentBytes = GetDocumentBytes ();
using (Stream requestStream = request.GetRequestStream())
{
requestStream.Write(documentBytes, 0, documentBytes.Length);
requestStream.Flush();
requestStream.Close();
}
Would really appreciate if someone can point me to some resources where can i dig more detail about this or if someone can explain this behavior.
Per http 1.1 RFC, HttpWebRequest has a default connection limit of two from a client to a host – that might cause a drop in latency beyond two concurrent users. It can be changed through the app.config of the client process, see here and here.
Related
I am trying to iterate over a list of 20,000+ customer records. I am using a Parallel.ForEach() loop to attempt to speed up the processing. Inside the delegate function, I am making an HTTP POST to an external web service to verify the customer information. In doing so, the loop is limited to 2 threads, or logical cores. If I attempt to increase the Degree of Parallelism, the process throws an error "The underlying connection was closed: A connection that was expected to be kept alive was closed by the server"
Is this default behavior of the loop when working with external processes or a limitation of the receiving web server?
My code is rather straight forward:
Parallel.ForEach ( customerlist, new ParallelOptions {MaxDegreeOfParallelism = 3 },( currentCustomer ) =>
{
if ( IsNotACustomer ( currentCustomer.TIN ) == true ) <--IsNotCustomer is where the HTTP POST takes place
{
...Write data to flat file...
}
});
If I change the MaxDegreesOfParallelism to 2 the loop runs fine.
This code takes about 80 minutes to churn through 20,000 records. While that is not unacceptable, if I could shorten that time by increasing the number of threads, then all the better.
Full exception message (without stack trace):
System.Net.WebException: The underlying connection was closed: A
connection that was expected to be kept alive was closed by the
server.
at System.Net.HttpWebRequest.GetResponse()
Any assistance would be greatly appreciated.
EDIT
The HTTP POST code is:
HttpWebRequest request = ( HttpWebRequest )WebRequest.Create ( AppConfig.ESLBridgeURL + action );
request.Method = "POST";
request.GetRequestStream ( ).Write ( Encoding.UTF8.GetBytes ( body ), 0, body.Length );
Stream stream = request.GetResponse ( ).GetResponseStream ( );
StreamReader reader = new StreamReader ( stream );
output = reader.ReadToEnd ( );
The URL is to an in-house server running proprietary Web Sphere MQ services. The gist of which is to check internal data sources to see whether or not we have a relationship with the customer.
We run this same process in our customer relationship management process in hundreds of sites per day. So I do not believe there is any licensing issue and I am certain these MQ services can accept multiple calls per client.
EDIT 2
A little more research has shown the 2 connection limit is valid. However, using a ServicePointManager may be able to bypass this limitation. What I cannot find is a C# example of using the ServicePointManager with HttpWebRequests.
Can anyone point me to a valid resource or provide a code example?
You might be running up against the default 2 client limit. See System.Net.ServicePointManager.DefaultConnectionLimit on MSDN.
The maximum number of concurrent connections allowed by a ServicePoint object. The default value is 2.
Possibly relevant question: How Can I programmatically remove the 2 connection limit in WebClient?
Thank you Matt Stephenson and Matt Jordan for pointing me in the correct direction.
I found a solution that has cut my processing in half. I will continue to tweak to get the best results, but here is what I arrived at.
I added the following to the application config file:
<system.net>
<connectionManagement>
<add address="*" maxconnection="100"/>
</connectionManagement>
</system.net>
I then figured out how to use the ServicePointManager and set the following:
int dop = Environment.ProcessorCount;
ServicePointManager.MaxServicePoints = 4;
ServicePointManager.MaxServicePointIdleTime = 10000;
ServicePointManager.UseNagleAlgorithm = true;
ServicePointManager.Expect100Continue = false;
ServicePointManager.DefaultConnectionLimit = dop * 10;
ServicePoint sp = ServicePointManager.FindServicePoint ( new Uri ( AppConfig.ESLBridgeURL ) );
For my development machine, the Processor Count is 8.
This code, as is, allows me to process my 20,000+ records in roughly 45 minutes (give or take).
I have a thread that runs periodically every 60 seconds. This thread is getting response from a web url. Everything is fine until the third run. It doesn't work anymore and shows this error :
"The operation has timed out"
This is my code and error found on line 5. Thanks!
string sURL;
sURL = "http://www.something.com";
WebRequest wrGETURL;
wrGETURL = WebRequest.Create(sURL);
HttpWebResponse http = (HttpWebResponse)wrGETURL.GetResponse();
Stream objStream = null;
objStream = http.GetResponseStream();
You might want to consider using the using statement:
string sURL;
sURL = "http://www.something.com";
using (WebRequest wrGETURL = WebRequest.Create(sURL))
{
using (HttpWebResponse http = (HttpWebResponse)wrGETURL.GetResponse())
{
Stream objStream = http.GetResponseStream();
//etc.
}
}
it guarantees that the Dispose method is called, even in case a exception occurs. (https://msdn.microsoft.com/en-us/library/yh598w02.aspx)
The reason for the timeout is probably that your server has a limit of x simultaneous requests. Due to the improper disposure, the connection will stay open longer then needed. And although the garbage collector will fix this for you, it's timing is often too late.
That's why I alway's recommend to call Dispose, through using for all objects that implements IDisposable. This is especially true when you use these object in loops or low-memory (low resource) systems.
Careful with the streams though, they tend to use a decorator pattern and might call Dispose on all its "child" objects.
Typically applies to:
Graphics objects
Database connections
TCP/IP (http etc.) connections
File system access
Code with native components, such as driver for usb, webcam's etc.
Stream objects
The magic number "3" is from here:
The maximum number of concurrent connections allowed by a ServicePoint object. The default connection limit is 10 for ASP.NET hosted applications and 2 for all others.
I have WPF app that processes a lot of urls (thousands), each it sends off to it's own thread, does some processing and stores a result in the database.
The urls can be anything, but some seem to be massively big pages, this seems to shoot the memory usage up a lot and make performance really bad. I set a timeout on the web request, so if it took longer than say 20 seconds it doesn't bother with that url, but it seems to not make much difference.
Here's the code section:
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(urlAddress.Address);
req.Timeout = 20000;
req.ReadWriteTimeout = 20000;
req.Method = "GET";
req.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
using (StreamReader reader = new StreamReader(req.GetResponse().GetResponseStream()))
{
pageSource = reader.ReadToEnd();
req = null;
}
It also seems to stall/ramp up memory on reader.ReadToEnd();
I would have thought having a cut off of 20 seconds would help, is there a better method? I assume there's not much advantage to using asynch web method as each url download is on its own thread anyway..
Thanks
In general, it's recommended that you use asynchronous HttpWebRequests instead of creating your own threads. The article I've linked above also includes some benchmarking results.
I don't know what you're doing with the page source after you read the stream to end, but using string can be an issue:
System.String type is used in any .NET application. We have strings
as: names, addresses, descriptions, error messages, warnings or even
application settings. Each application has to create, compare or
format string data. Considering the immutability and the fact that any
object can be converted to a string, all the available memory can be
swallowed by a huge amount of unwanted string duplicates or unclaimed
string objects.
Some other suggestions:
Do you have any firewall restrictions? I've seen a lot of issues at work where the firewall enables rate limiting and fetching pages grinds down to a halt (happens to me all the time)!
I presume that you're going to use the string to parse HTML, so I would recommend that you initialize your parser with the Stream instead of passing in a string containing the page source (if that's an option).
If you're storing the page source in the database, then there isn't much you can do.
Try to eliminate the reading of the page source as a potential contributor to the memory/performance problem by commenting it out.
Use a streaming HTML parser such as Majestic 12- avoids the need to load the entire page source into memory (again, if you need to parse)!
Limit the size of the pages you're going to download, say, only download 150KB. The average page size is about 100KB-130KB
Additionally, can you tell us what's your initial rate of fetching pages and what does it go down to? Are you seeing any errors/exceptions from the web request as you're fetching pages?
Update
In the comment section I noticed that you're creating thousands of threads and I would say that you don't need to do that. Start with a small number of threads and keep increasing them until you peek the performance on your system. Once you start adding threads and the performance looks like it's tapered off, then sop adding threads. I can't imagine that you will need more than 128 threads (even that seems high). Create a fixed number of threads, e.g. 64, let each thread take a URL from your queue, fetch the page, process it and then go back to getting pages from the queue again.
You could enumerate with a buffer instead of calling ReadToEnd, and if it is taking too long, then you could log and abandon - something like:
static void Main(string[] args)
{
Uri largeUri = new Uri("http://www.rfkbau.de/index.php?option=com_easybook&Itemid=22&startpage=7096");
DateTime start = DateTime.Now;
int timeoutSeconds = 10;
foreach (var s in ReadLargePage(largeUri))
{
if ((DateTime.Now - start).TotalSeconds > timeoutSeconds)
{
Console.WriteLine("Stopping - this is taking too long.");
break;
}
}
}
static IEnumerable<string> ReadLargePage(Uri uri)
{
int bufferSize = 8192;
int readCount;
Char[] readBuffer = new Char[bufferSize];
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
using (StreamReader stream = new StreamReader(response.GetResponseStream(), Encoding.UTF8))
{
readCount = stream.Read(readBuffer, 0, bufferSize);
while (readCount > 0)
{
yield return new string(readBuffer, 0, bufferSize);
readCount = stream.Read(readBuffer, 0, bufferSize);
}
}
}
Lirik has really good summary.
I would add that if I were implementing this, I would make a separate process that reads the pages. So, it would be a pipeline. First stage would download the URL and write it to a disk location. And then queue that file to the next stage. Next stage reads from the disk and does the parsing & DB updates. That way you will get max throughput on the download and parsing as well. You can also tune your threadpools so that you have more workers parsing, etc. This architecture also lends very well to distributed processing where you can have one machine downloading, and another host parsing/etc.
Another thing to note is that if you are hitting the same server from multiple threads (even if you are using Async) then you will hit yourself against the max outgoing connection limit. You can throttle yourself to stay below that, or increase the connection limit on the ServicePointManager class.
I'm posting a file with HttpWebRequest, along with a header and footer. The header (ca. 0.5K) and the actual file seem to write fine, but with large files (ca. 15MB), the footer (which is like 29 bytes) never seems to write.
using (Stream requestStream = request.GetRequestStream()) {
requestStream.Write(postHeaderBytes, 0, postHeaderBytes.Length);
byte[] buffer = new byte[Math.Min(4096L, fileSize)];
int bytesRead = 0;
while ((bytesRead = fileStream.Read(buffer, 0, buffer.Length)) != 0) {
requestStream.Write(buffer, 0, bytesRead);
}
// next line never completes
requestStream.Write(postFooterBytes, 0, postFooterBytes.Length);
// code below is never reached
Console.WriteLine("Why do I never see this message in the console?");
}
Any thoughts?
ETA: Tried flushing the stream before the last Write(), on the off chance it would help, but to no effect.
Edited again: Added using() to clarify that I'm not a complete idiot. Note also BTW that this is inside another using() block for fileStream.
Solved: Turned off AllowWriteStreamBuffering on the HttpWebRequest. Looks like when it's on, whatever Write() call writes the last byte, it doesn't return till the internal buffer's cleared. So the last Write() was eventually competing, just not till I ran out of patience.
And since what I was originally trying to do was determine progress, turning off buffering makes things clearer anyway.
A common problem is forgetting closing the request stream. One of the symptoms you'll see is that the request is never made. It's quite likely that the write really is completing, but since you didn't close the request stream, the call to HttpWebRequest.GetResponse() appears not to be executed.
Try the following and see if it makes a difference:
using (var requestStream = myRequest.GetRequestStream())
{
// write to the request stream here
}
// Now try to get the response.
Another possible issue is the size of the data. First, are you sure that the server can handle a 15 MB upload? Secondly, if you're doing this on a slow connection, 15 MB can take a while to send. I have what's considered a "fast" upstream connection at 1.5 megabits/sec. That's, at best, 0.15 megabytes per second. Sending 15 megabytes will take over a minute and a half.
One other possibility is that the request is timing out. You want to look into the HttpWebRequest.Timeout and ReadWriteTimeout properties.
When you are building your request, your content length should include the headers as well, make sure its not just set to the file length.
The other thing you may try is to call .Flush() on the stream when all is said and done.
I'm not sure of the implication of closing the stream for the HttpClient as Jim suggests, it may work, it may make it worse.
Does using System.Net.WebClient not offer enough flexibility for you? Theres a nice UploadFile() method you can use.
I believe after lengthy research and searching, I have discovered that what I want to do is probably better served by setting up an asynchronous connection and terminating it after the desired timeout... But I will go ahead and ask anyway!
Quick snippet of code:
HttpWebRequest webReq = (HttpWebRequest)HttpWebRequest.Create(url);
webReq.Timeout = 5000;
HttpWebResponse response = (HttpWebResponse)webReq.GetResponse();
// this takes ~20+ sec on servers that aren't on the proper port, etc.
I have an HttpWebRequest method that is in a multi-threaded application, in which I am connecting to a large number of company web servers. In cases where the server is not responding, the HttpWebRequest.GetResponse() is taking about 20 seconds to time out, even though I have specified a timeout of only 5 seconds. In the interest of getting through the servers on a regular interval, I want to skip those taking longer than 5 seconds to connect to.
So the question is: "Is there a simple way to specify/decrease a connection timeout for a WebRequest or HttpWebRequest?"
I believe that the problem is that the WebRequest measures the time only after the request is actually made. If you submit multiple requests to the same address then the ServicePointManager will throttle your requests and only actually submit as many concurrent connections as the value of the corresponding ServicePoint.ConnectionLimit which by default gets the value from ServicePointManager.DefaultConnectionLimit. Application CLR host sets this to 2, ASP host to 10. So if you have a multithreaded application that submits multiple requests to the same host only two are actually placed on the wire, the rest are queued up.
I have not researched this to a conclusive evidence whether this is what really happens, but on a similar project I had things were horrible until I removed the ServicePoint limitation.
Another factor to consider is the DNS lookup time. Again, is my belief not backed by hard evidence, but I think the WebRequest does not count the DNS lookup time against the request timeout. DNS lookup time can show up as very big time factor on some deployments.
And yes, you must code your app around the WebRequest.BeginGetRequestStream (for POSTs with content) and WebRequest.BeginGetResponse (for GETs and POSTSs). Synchronous calls will not scale (I won't enter into details why, but that I do have hard evidence for). Anyway, the ServicePoint issue is orthogonal to this: the queueing behavior happens with async calls too.
Sorry for tacking on to an old thread, but I think something that was said above may be incorrect/misleading.
From what I can tell .Timeout is NOT the connection time, it is the TOTAL time allowed for the entire life of the HttpWebRequest and response. Proof:
I Set:
.Timeout=5000
.ReadWriteTimeout=32000
The connect and post time for the HttpWebRequest took 26ms
but the subsequent call HttpWebRequest.GetResponse() timed out in 4974ms thus proving that the 5000ms was the time limit for the whole send request/get response set of calls.
I didn't verify if the DNS name resolution was measured as part of the time as this is irrelevant to me since none of this works the way I really need it to work--my intention was to time out quicker when connecting to systems that weren't accepting connections as shown by them failing during the connect phase of the request.
For example: I'm willing to wait 30 seconds on a connection request that has a chance of returning a result, but I only want to burn 10 seconds waiting to send a request to a host that is misbehaving.
Something I found later which helped, is the .ReadWriteTimeout property. This, in addition to the .Timeout property seemed to finally cut down on the time threads would spend trying to download from a problematic server. The default time for .ReadWriteTimeout is 5 minutes, which for my application was far too long.
So, it seems to me:
.Timeout = time spent trying to establish a connection (not including lookup time)
.ReadWriteTimeout = time spent trying to read or write data after connection established
More info: HttpWebRequest.ReadWriteTimeout Property
Edit:
Per #KyleM's comment, the Timeout property is for the entire connection attempt, and reading up on it at MSDN shows:
Timeout is the number of milliseconds that a subsequent synchronous request made with the GetResponse method waits for a response, and the GetRequestStream method waits for a stream. The Timeout applies to the entire request and response, not individually to the GetRequestStream and GetResponse method calls. If the resource is not returned within the time-out period, the request throws a WebException with the Status property set to WebExceptionStatus.Timeout.
(Emphasis mine.)
From the documentation of the HttpWebRequest.Timeout property:
A Domain Name System (DNS) query may
take up to 15 seconds to return or
time out. If your request contains a
host name that requires resolution and
you set Timeout to a value less than
15 seconds, it may take 15 seconds or
more before a WebException is thrown
to indicate a timeout on your request.
Is it possible that your DNS query is the cause of the timeout?
No matter what we tried we couldn't manage to get the timeout below 21 seconds when the server we were checking was down.
To work around this we combined a TcpClient check to see if the domain was alive followed by a separate check to see if the URL was active
public static bool IsUrlAlive(string aUrl, int aTimeoutSeconds)
{
try
{
//check the domain first
if (IsDomainAlive(new Uri(aUrl).Host, aTimeoutSeconds))
{
//only now check the url itself
var request = System.Net.WebRequest.Create(aUrl);
request.Method = "HEAD";
request.Timeout = aTimeoutSeconds * 1000;
var response = (HttpWebResponse)request.GetResponse();
return response.StatusCode == HttpStatusCode.OK;
}
}
catch
{
}
return false;
}
private static bool IsDomainAlive(string aDomain, int aTimeoutSeconds)
{
try
{
using (TcpClient client = new TcpClient())
{
var result = client.BeginConnect(aDomain, 80, null, null);
var success = result.AsyncWaitHandle.WaitOne(TimeSpan.FromSeconds(aTimeoutSeconds));
if (!success)
{
return false;
}
// we have connected
client.EndConnect(result);
return true;
}
}
catch
{
}
return false;
}