HttpWebRequest Slows with multiple instances of application - c#

trying to get to the bottom of this!
i have a very basic app that is using httpwebrequests to login, navigate to a page and then grab the html of that page. it then preforms another webrequest to a third page every 5 mins in a loop.
its all working fine and is single threaded (and fairly old), however circumstances have changed and i now need to run multiple instances of this app closely together (i have a .bat starting the app every 2seconds as a temporary measure until i am able to code a new multithreaded solution).
when the first instances of the app start everything is fine, first request is completed in ~2seconds. second one in about 3seconds.
however as more and more instances of this app are run concurrently (>100) something strange starts to happen.
the first web request still takes ~2 seconds, however the second request gets delayed much more >1min up to the point of timeout. i cant seem to think why this is. the second page is larger than the first, but nothing out of the ordinary that would take >1min to download.
The internet connection and hardware of this server is more than capable of handling these requests.
CookieContainer myContainer = new CookieContainer();
// first request is https
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(https://mysite.com/urlone);
request.CookieContainer = myContainer;
request.Proxy = proxy;
Console.WriteLine(System.DateTime.Now.ToLongTimeString() + " " + "Starting login request");
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream resStream = response.GetResponseStream();
string tempString = null;
int count = 0;
do
{
// fill the buffer with data
count = resStream.Read(buf, 0, buf.Length);
// make sure we read some data
if (count != 0)
{
// translate from bytes to ASCII text
tempString = Encoding.ASCII.GetString(buf, 0, count);
// continue building the string
sb.Append(tempString);
}
}
while (count > 0); // any more data to read?
sb.Clear();
response.Close();
resStream.Close();
string output6;
Console.WriteLine(System.DateTime.Now.ToLongTimeString() + " " + "login request comeplete");
HttpWebRequest request6 = (HttpWebRequest)WebRequest.Create(#"http://mysite.com/page2");
request6.CookieContainer = myContainer;
response = (HttpWebResponse)request6.GetResponse();
resStream = response.GetResponseStream();
tempString = null;
count = 0;
do
{
count = resStream.Read(buf, 0, buf.Length);
if (count != 0)
{
tempString = Encoding.ASCII.GetString(buf, 0, count);
sb.Append(tempString);
}
}
while (count > 0);
output6 = sb.ToString();
sb.Clear();
response.Close();
resStream.Close();
Any ideas? Im not very advanced with http web requests so if someone could check i haven't made any silly code mistakes above id appreciate it. Im at a loss as to what other information i may need to include here, if i have missed anything out please tell me and i will do my best to provide.
Thanks in advance.
EDIT 1:
I used fiddler to find out the source of the issue. It looks like the issue lies with the application (or windows) not sending the requests for some reason - the physical request actually takes < 1second according to fiddler.

Check out a few things
ServicePointManager.DefaultConnectionLimit : if you are planning to open more then 100 connection the set this value to something like 200-300.
If possible use HttpWebRequest.KeepALive = true

Try wrapping your request into a using directive to make sure it's always properly closed. If you hit the max number of connections, you otherwise have to wait for the earlier ones to time out before new ones connect.

Related

content from a website in a text file

My aim is to get content from a website (for instance a league table from a sports website) and put it in a .txt file so that I can code with a local file.
I have tried multiple lines of code and others examples such as:
// prepare the web page we will be asking for
HttpWebRequest request = (HttpWebRequest)
WebRequest.Create("http://www.stackoverflow.com");
// prepare the web page we will be asking for
HttpWebRequest request = (HttpWebRequest)
WebRequest.Create("http://www.stackoverflow.com");
// execute the request
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
// we will read data via the response stream
Stream resStream = response.GetResponseStream();
string tempString = null;
int count = 0;
do
{
// fill the buffer with data
count = resStream.Read(buf, 0, buf.Length);
// make sure we read some data
if (count != 0)
{
// translate from bytes to ASCII text
tempString = Encoding.ASCII.GetString(buf, 0, count);
// continue building the string
sb.Append(tempString);
}
while (count > 0); // any more data to read?
}
My issue is when trying this, is that the words request and response are underlined in read and all the tokens are invalid.
Is there a better method to get content from a website to a .txt file or is there a way to fix the code supplied?
Thanks
is there a way to fix the code supplied?
The code you submitted works for me, make sure you have the proper name spaces defined.
In this case : using System.Net;
Or might it be that the duplicate creation of the variable request isn't a typo?
If so remove one of the request variables.
Is there a better method to get content from a website to a .txt file
Since you're reading all the content from the site anyway there isn't really a need for the while loop. Instead you can use the ReadToEnd method supplied by the StreamReader.
string siteContent = "";
using (StreamReader reader = new StreamReader(resStream)) {
siteContent = reader.ReadToEnd();
}
Also be sure to dispose of the WebResponse, other than that your code should work fine.

Read endless stream with HttpWebRequest C#

I'm using C# to get data from endless http-stream. I used TCP Client before, but now I want to add status code exceptions, it's much easier to do it with HttpWebResponse. I got response, but problem is I can't read chunks coming from server. Everything looks fine, but I must be missed something. Debug shows execution is stucked at ReadLine(). Also I can see there's some data in stream buffer.
HttpWebRequest req = (HttpWebRequest)WebRequest.Create("http://" + url + "/stream/" + from + "?token=" + token);
using(HttpWebResponse res = (HttpWebResponse)req.GetResponse())
using(StreamReader streadReader = new StreamReader(res.GetResponseStream(),Encoding.UTF8)){
//Start parsing cycle
while(!streadReader.EndOfStream && !worker.CancellationPending) {
string resultLine = streadReader.ReadLine();
System.Diagnostics.Trace.WriteLine(resultLine);
if(!resultLine.StartsWith("{")) continue;
newQuote quote = JsonConvert.DeserializeObject<newQuote>(resultLine);
worker.ReportProgress(0, quote);
}
}
I'v found the error. I was using ReadLine(), but never sent "\r\n" from server side. I patched my server to send \r\n after JSON object ant it works fine now.
Another option is to use Read() method. It's not waiting for the end of line, but you must know chunk length.
Thanks for the help :)
Upd: How to parse string using Read() method:
while(!streamReader.EndOfStream && !worker.CancellationPending) {
char[] buffer = new char[1024];
streamReader.Read(buffer, 0, buffer.Length);
string resultLine="";
for(int i = 0; i < buffer.Length; i++) {
if(buffer[i] != 0) resultLine += buffer[i].ToString();
}
newQuote quote = JsonConvert.DeserializeObject<newQuote>(resultLine);
worker.ReportProgress(0, quote);
}

Request doesn't wait for response in .net

I have a windows service which is uploading files to the other website which is processing them. The problem is that with small files it's working fine and it's getting response from there, but with large files (about 6 minute to process) it leaves forever in a waiting mode.
Here is the part of external website post method code:
try
{
...
LogResults();
return string.Empty;
}
catch (Exception e)
{
return e.Message;
}
The problem is that I can see logs even for large files, so it means that website always returning value, but for large files my windows service doesn't wait for them.
And here is the code from windows service
var valuesp = new NameValueCollection
{
{ "AccountId", datafeed.AccountId }
};
byte[] resultp = UploadHelper.UploadFiles(url, uploadFiles, valuesp);
response = Encoding.Default.GetString(resultp);
UploadFiles method returns value for small files, but waiting forever for large ones.
Here is complete code of UploadFiles
public static byte[] UploadFiles(string address, IEnumerable<UploadFile> files, NameValueCollection values)
{
var request = WebRequest.Create(address);
request.Timeout = System.Threading.Timeout.Infinite; //3600000; // 60 minutes
request.Method = "POST";
var boundary = "---------------------------" +
DateTime.Now.Ticks.ToString("x", NumberFormatInfo.InvariantInfo);
request.ContentType = "multipart/form-data; boundary=" + boundary;
boundary = "--" + boundary;
using (var requestStream = request.GetRequestStream())
{
// Write the values
if (values != null)
{
foreach (string name in values.Keys)
{
var buffer = Encoding.ASCII.GetBytes(boundary + Environment.NewLine);
requestStream.Write(buffer, 0, buffer.Length);
buffer =
Encoding.ASCII.GetBytes(string.Format("Content-Disposition: form-data; name=\"{0}\"{1}{1}", name,
Environment.NewLine));
requestStream.Write(buffer, 0, buffer.Length);
buffer = Encoding.UTF8.GetBytes(values[name] + Environment.NewLine);
requestStream.Write(buffer, 0, buffer.Length);
}
}
// Write the files
if (files != null)
{
foreach (var file in files)
{
var buffer = Encoding.ASCII.GetBytes(boundary + Environment.NewLine);
requestStream.Write(buffer, 0, buffer.Length);
buffer =
Encoding.UTF8.GetBytes(
string.Format("Content-Disposition: form-data; name=\"{0}\"; filename=\"{1}\"{2}", file.Name,
file.Filename, Environment.NewLine));
requestStream.Write(buffer, 0, buffer.Length);
buffer =
Encoding.ASCII.GetBytes(string.Format("Content-Type: {0}{1}{1}", file.ContentType,
Environment.NewLine));
requestStream.Write(buffer, 0, buffer.Length);
requestStream.Write(file.Stream, 0, file.Stream.Length);
buffer = Encoding.ASCII.GetBytes(Environment.NewLine);
requestStream.Write(buffer, 0, buffer.Length);
}
}
var boundaryBuffer = Encoding.ASCII.GetBytes(boundary + "--");
requestStream.Write(boundaryBuffer, 0, boundaryBuffer.Length);
}
using (var response = request.GetResponse())
using (var responseStream = response.GetResponseStream())
using (var stream = new MemoryStream())
{
responseStream.CopyTo(stream);
return stream.ToArray();
}
}
What I'm doing wrong here?
EDIT: Locally it's working even for 7-8 minutes processing. But in live environment doesn't. Can it be related with main app IIS settings? Can it be related with windows service server settings?
EDIT 2: Remote server web.config httpRuntime settings
<httpRuntime enableVersionHeader="false" maxRequestLength="300000" executionTimeout="12000" targetFramework="4.5" />
The problem was with Azure Load Balancer, which has Idle Timeout set to 4 minutes by default. The new post of Windows Azure blog says that this timeout is now configurable to the value between 4-30 minutes
http://azure.microsoft.com/blog/2014/08/14/new-configurable-idle-timeout-for-azure-load-balancer/
However, the problem was solved with sending additional KeepAlive bytes via TCP, which told Load Balancer to not kill requests.
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(address);
...
request.Proxy = null;
request.ServicePoint.SetTcpKeepAlive(true, 30000, 5000); //after 30 seconds, each 5 second
Setting Proxy to null (not using proxy) is mandatory action here, because otherwise proxy will not pass tcp bytes to the server.
Upload your files with a method that comply with RFC1867
See this
And then :
UploadFile[] files = new UploadFile[]
{
new UploadFile(fileName1),
new UploadFile(fileName2)
};
NameValueCollection form = new NameValueCollection();
form["name1"] = "value1";
form["name2"] = "xyzzy";
string response = UploadHelper.Upload(url, files, form);
That's all folks!
EDIT :
I use the method above to upload files with over 100MB in size, I don't use ASP at all, it works just perfect!
I had a similar issue where upload would work locally but timeout on server. There are 2 things at play here - I'm assuming you're talking about IIS7+. If not, let me know and I'll gladly delete my answer.
So first, there's ASP.NET which looks at this setting (under system.web):
<!-- maxRequestLength is in KB - 10 MB here. This is needed by ASP.NET. Must match with maxAllowedContentLength under system.webserver/security/requestLimits -->
<httpRuntime targetFramework="4.5" maxRequestLength="1048576" />
Then there's IIS:
<system.webServer>
<security>
<requestFiltering>
<!-- maxAllowedContentLength in bytes - 10MB -->
<requestLimits maxAllowedContentLength="10485760" />
</requestFiltering>
</security>
<system.webServer>
You need both of these settings to be present and the limit to match for things to work. Notice that one is in KB and the other one is in bytes.
Have you checked the IIS file size limit on the remote server? It defaults to 30MB, so if the files you are trying to upload are larger than that, it will fail. here's how to change the upload limit (IIS7): http://www.web-site-scripts.com/knowledge-base/article/AA-00696/0/Increasing-maximum-allowed-size-for-uploads-on-IIS7.html
You can achieve this task with AsyncCallback technique.
Whats AsyncCallback?
When the async method finish the processing, AsyncCallback method is automatically called, where post processing stmts can be executed. With this technique there is no need to poll or wait for the async thread to complete.
you can find more details from this link
AsynCallback
Maybe it is caused by the IIS upload limit?
In IIS7 the standard value is 30MB. See MSDN
EDIT1:
Please be aware that if you are uploading multiple files in 1 request the size adds up.
EDIT2:
In all MS examples there is always only one Stream.Write().
In MSDN it is stated: After the Stream object has been returned, you can send data with the HttpWebRequest by using the Stream.Write method. My interpretation of this sentence would be that you should call Write() only once. Put all data you want to send into the buffer and call the Write() afterwards.
you should change this executionTimeout="12000" to executionTimeout="360000" wich means changing execution timeout from 2 minutes to 6 minutes.

How to cancel large file download yet still get page source in C#?

I'm working in C# on a program to list all course resources for a MOOC (e.g. Coursera). I don't want to download the content, just get a listing of all the resources (e.g. pdf, videos, text files, sample files, etc...) which are made available to the course.
My problem lies in parsing the html source (currently using HtmlAgilityPack) without downloading all the content.
For example, if you go to this intro video for a banking course on Coursera and check the source (F12 in Chrome for Developer Tools), you can see the page source. I can stop the video download which autoplays, but still see the source.
How can I get the source in C# without download all the content?
I've looked in the HttpWebRequest headers (problem: time out), and DownloadDataAsync with Cancel (problem: the Completed Result object is invalid when cancelling the async request). I've also tried various Loads from HtmlAgilityPack but with no success.
Time out:
HttpWebRequest postRequest = (HttpWebRequest)WebRequest.Create(url);
postRequest.Timeout = TIMEOUT * 1000000; //Really long
postRequest.Referer = "https://www.coursera.org";
if (headers != null)
{ //headers here }
//Deal with cookies
if (cookie != null)
{ cookieJar.Add(cookie); }
postRequest.CookieContainer = cookiejar;
postRequest.Method = "GET";
postRequest.AllowAutoRedirect = allowRedirect;
postRequest.ServicePoint.Expect100Continue = true;
HttpWebResponse postResponse = (HttpWebResponse)postRequest.GetResponse();
Any tips on how to proceed?
There are at least two ways to do what you're asking. The first is to use a range get. That is, specify the range of the file you want to read. You do that by calling AddRange on the HttpWebRequest. So if you want, say, the first 10 kilobytes of the file, you'd write:
request.AddRange(-10240);
Read carefully what the documentation says about the meaning of that parameter. If it's negative, it specifies the ending point of the range. There are also other overloads of AddRange that you might be interested in.
Not all servers support range gets, though. If that doesn't work, you'll have to do it another way.
What you can do is call GetResponse and then start reading data. Once you've read as much data as you want, you can stop reading and close the stream. I've modified your sample slightly to show what I mean.
string url = "https://www.coursera.org/course/money";
HttpWebRequest postRequest = (HttpWebRequest)WebRequest.Create(url);
postRequest.Method = "GET";
postRequest.AllowAutoRedirect = true; //allowRedirect;
postRequest.ServicePoint.Expect100Continue = true;
HttpWebResponse postResponse = (HttpWebResponse) postRequest.GetResponse();
int maxBytes = 1024*1024;
int totalBytesRead = 0;
var buffer = new byte[maxBytes];
using (var s = postResponse.GetResponseStream())
{
int bytesRead;
// read up to `maxBytes` bytes from the response
while (totalBytesRead < maxBytes && (bytesRead = s.Read(buffer, 0, maxBytes)) != 0)
{
// Here you can save the bytes read to a persistent buffer,
// or write them to a file.
Console.WriteLine("{0:N0} bytes read", bytesRead);
totalBytesRead += bytesRead;
}
}
Console.WriteLine("total bytes read = {0:N0}", totalBytesRead);
That said, I ran this sample and it downloaded about 6 kilobytes and stopped. I don't know why you're having trouble with timeouts or too much data.
Note that sometimes trying to close the stream before the entire response is read will cause the program to hang. I'm not sure why that happens at all, and I can't explain why it only happens sometimes. But you can solve it by calling request.Abort before closing the stream. That is:
using (var s = postResponse.GetResponseStream())
{
// do stuff here
// abort the request before continuing
postRequest.Abort();
}

C# - WebRequest Doesn't Return Different Pages

Here's the purpose of my console program: Make a web request > Save results from web request > Use QueryString to get next page from web request > Save those results > Use QueryString to get next page from web request, etc.
So here's some pseudocode for how I set the code up.
for (int i = 0; i < 3; i++)
{
strPageNo = Convert.ToString(i);
//creates the url I want, with incrementing pages
strURL = "http://www.website.com/results.aspx?page=" + strPageNo;
//makes the web request
wrGETURL = WebRequest.Create(strURL);
//gets the web page for me
objStream = wrGETURL.GetResponse().GetResponseStream();
//for reading web page
objReader = new StreamReader(objStream);
//--------
// -snip- code that saves it to file, etc.
//--------
objStream.Close();
objReader.Close();
//so the server doesn't get hammered
System.Threading.Thread.Sleep(1000);
}
Pretty simple, right? The problem is, even though it increments the page number to get a different web page, I'm getting the exact same results page each time the loop runs.
i IS incrementing correctly, and I can cut/paste the url strURL creates into a web browser and it works just fine.
I can manually type in &page=1, &page=2, &page=3, and it'll return the correct pages. Somehow putting the increment in there screws it up.
Does it have anything to do with sessions, or what? I make sure I close both the stream and the reader before it loops again...
Have you tried creating a new WebRequest object for each time during the loop, it could be the Create() method isn't adequately flushing out all of its old data.
Another thing to check is that the ResponseStream is adequately flushed out before the next loop iteration.
This code works fine for me:
var urls = new [] { "http://www.google.com", "http://www.yahoo.com", "http://www.live.com" };
foreach (var url in urls)
{
WebRequest request = WebRequest.Create(url);
using (Stream responseStream = request.GetResponse().GetResponseStream())
using (Stream outputStream = new FileStream("file" + DateTime.Now.Ticks.ToString(), FileMode.Create, FileAccess.Write, FileShare.None))
{
const int chunkSize = 1024;
byte[] buffer = new byte[chunkSize];
int bytesRead;
while ((bytesRead = responseStream.Read(buffer, 0, buffer.Length)) > 0)
{
byte[] actual = new byte[bytesRead];
Buffer.BlockCopy(buffer, 0, actual, 0, bytesRead);
outputStream.Write(actual, 0, actual.Length);
}
}
Thread.Sleep(1000);
}
Just a suggestion, try disposing the Stream, and the Reader. I've seen some weird cases where not disposing objects like these and using them in loops can yield some wacky results....
That URL doesn't quite make sense to me unless you are using MVC or something that can interpret the querystring correctly.
http://www.website.com/results.aspx&page=
should be:
http://www.website.com/results.aspx?page=
Some browsers will accept poorly formed URLs and render them fine. Others may not which may be the problem with your console app.
Here's my terrible, hack-ish, workaround solution:
Make another console app that calls THIS one, in which the first console app passes an argument at the end of strURL. It works, but I feel so dirty.

Categories