Incomplete HttpWebResponse with large data sets - c#

I have some code that downloads the content of a webpage that I've been using for a while. This code works fine and has never provided an issue and still doesn't... However, there is a page that is rather large (2MB, no images) with 4 tables with 4, 20, 100, 600 rows respectively and about 20 columns wide.
When trying to get all the data it completes without any apparent errors or exceptions but only returns up to about row 60 in the 4th table - sometimes more, sometimes less. The broswer completes loading in about 20-30 seconds with constant, what seems like flushes, to the page until complete.
I've tried a number of solutions from SO and searches without any different results. Below is the current code, but I've: proxy, async, no timeouts, false keepalive...
I can't use WebClient (as another far-fetch attempt) because I need to login using the cookiecontainer.
HttpWebRequest pageImport = (HttpWebRequest)WebRequest.Create(importUri);
pageImport.ReadWriteTimeout = Int32.MaxValue;
pageImport.Timeout = Int32.MaxValue;
pageImport.UserAgent = "User-Agent Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3";
pageImport.Accept = "Accept text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
pageImport.KeepAlive = true;
pageImport.Timeout = Int32.MaxValue;
pageImport.ReadWriteTimeout = Int32.MaxValue;
pageImport.MaximumResponseHeadersLength = Int32.MaxValue;
if (null != LoginCookieContainer)
{
pageImport.CookieContainer = LoginCookieContainer;
}
Encoding encode = System.Text.Encoding.GetEncoding("utf-8");
using (WebResponse response = pageImport.GetResponse())
using (Stream stream = response.GetResponseStream())
using (StreamReader reader = new StreamReader(stream, encode))
{
stream.Flush();
HtmlRetrieved = reader.ReadToEnd();
}

Try to read block wise instead of reader.ReadToEnd();
Just to give you an idea:
// Pipe the stream to a higher level stream reader with the required encoding format.
StreamReader readStream = new StreamReader( ReceiveStream, encode );
Console.WriteLine("\nResponse stream received");
Char[] read = new Char[256];
// Read 256 charcters at a time.
int count = readStream.Read( read, 0, 256 );
Console.WriteLine("HTML...\r\n");
while (count > 0)
{
// Dump the 256 characters on a string and display the string onto the console.
String str = new String(read, 0, count);
Console.Write(str);
count = readStream.Read(read, 0, 256);
}

I suspect this is handled as a configuration setting on the server side. Incidentally, I think you may be setting your properties incorrectly. Remove the "user-agent" and "accept" from the literals, as such:
pageImport.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3";
pageImport.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";

While I'm still going to try the suggestions provided and will change my answer if it works, it seems that in this case, the problem IS the proxy. I got in front of the proxy and the code works as expected and much quicker.
I'll have to look at some proxy optimizations since this code must run behind the proxy.

Related

How to upload data by portions via HttpWebRequest

Problem:
I want to upload data by chunks via a single http request and show progress changes after each uploading (phisical sending data over the internet). (Now is not important how I shall show an uploading progress. I can simply output some data to the console).
Code:
Stackoverlow has many such questions:
link 1, etc. (I can not include more links because I have no sufficient reputation).
using System;
using System.Text;
using System.IO;
using System.Net;
...
public static void UploadData()
{
const string data = "simple string";
byte[] buffer = new ASCIIEncoding().GetBytes(data);
// Thanks to http://www.posttestserver.com all is working from the box
HttpWebRequest req = (HttpWebRequest)WebRequest.Create("http://posttestserver.com/post.php");
req.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.10 " +
"(KHTML, like Gecko) Chrome/8.0.552.224 Safari/534.10";
req.Accept = "application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
req.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.3");
req.Headers.Add("Accept-Language", "en-US,en;q=0.8");
req.Method = "POST";
req.ContentType = "application/x-www-form-urlencoded";
req.ContentLength = buffer.Length;
req.SendChunked = true;
int bytesRead = buffer.Length;
const int chunkSize = 3;
Stream s = req.GetRequestStream();
for (int offset = 0; offset < bytesRead; offset += chunkSize)
{
int bytesLeft = bytesRead - offset;
int bytesWrite = bytesLeft > chunkSize ? chunkSize : bytesLeft;
s.Write(buffer, offset, bytesWrite);
}
s.Close(); // IMPORTANT: only here all data will be send
}
Remarks:
Also according to
this link,
each sending must occur during each writing to a request stream, but in reality (it can be demonstrated in Fiddler) all sending operations occur only after request stream closing or only by response getting and not earlier. (all depends from the SendChuncked, AllowWriteStreamBuffering and ContentLength parameters, but data are never sent after each writing to a stream).
Question:
How data can be sent (physically) after each writing (each call of the Write method)?
Constraints:
Net 2.0;
using only the HttpWebRequest primitive (not WebClient).
Because nobody answered this question, but this question has been answered on russian stackoverflow by the zergatul user, I shall post this answer from russian stackoverflow here.
The answer
It works how you expected. I used Microsoft Network Monitor. This is a good utility and it is free (in contrast to httpdebugger). I debugged your code in Net 2.0.
Network Monitor shows each 3-byte sending (I have taken a longer string).
Here the text "ple" ("simple string") has been sent.
Remark
In the first picture the string
// ВАЖНО: только здесь будут отправлены данные через сеть
means
// IMPORTANT: only here data will be sent over the net

How to cancel large file download yet still get page source in C#?

I'm working in C# on a program to list all course resources for a MOOC (e.g. Coursera). I don't want to download the content, just get a listing of all the resources (e.g. pdf, videos, text files, sample files, etc...) which are made available to the course.
My problem lies in parsing the html source (currently using HtmlAgilityPack) without downloading all the content.
For example, if you go to this intro video for a banking course on Coursera and check the source (F12 in Chrome for Developer Tools), you can see the page source. I can stop the video download which autoplays, but still see the source.
How can I get the source in C# without download all the content?
I've looked in the HttpWebRequest headers (problem: time out), and DownloadDataAsync with Cancel (problem: the Completed Result object is invalid when cancelling the async request). I've also tried various Loads from HtmlAgilityPack but with no success.
Time out:
HttpWebRequest postRequest = (HttpWebRequest)WebRequest.Create(url);
postRequest.Timeout = TIMEOUT * 1000000; //Really long
postRequest.Referer = "https://www.coursera.org";
if (headers != null)
{ //headers here }
//Deal with cookies
if (cookie != null)
{ cookieJar.Add(cookie); }
postRequest.CookieContainer = cookiejar;
postRequest.Method = "GET";
postRequest.AllowAutoRedirect = allowRedirect;
postRequest.ServicePoint.Expect100Continue = true;
HttpWebResponse postResponse = (HttpWebResponse)postRequest.GetResponse();
Any tips on how to proceed?
There are at least two ways to do what you're asking. The first is to use a range get. That is, specify the range of the file you want to read. You do that by calling AddRange on the HttpWebRequest. So if you want, say, the first 10 kilobytes of the file, you'd write:
request.AddRange(-10240);
Read carefully what the documentation says about the meaning of that parameter. If it's negative, it specifies the ending point of the range. There are also other overloads of AddRange that you might be interested in.
Not all servers support range gets, though. If that doesn't work, you'll have to do it another way.
What you can do is call GetResponse and then start reading data. Once you've read as much data as you want, you can stop reading and close the stream. I've modified your sample slightly to show what I mean.
string url = "https://www.coursera.org/course/money";
HttpWebRequest postRequest = (HttpWebRequest)WebRequest.Create(url);
postRequest.Method = "GET";
postRequest.AllowAutoRedirect = true; //allowRedirect;
postRequest.ServicePoint.Expect100Continue = true;
HttpWebResponse postResponse = (HttpWebResponse) postRequest.GetResponse();
int maxBytes = 1024*1024;
int totalBytesRead = 0;
var buffer = new byte[maxBytes];
using (var s = postResponse.GetResponseStream())
{
int bytesRead;
// read up to `maxBytes` bytes from the response
while (totalBytesRead < maxBytes && (bytesRead = s.Read(buffer, 0, maxBytes)) != 0)
{
// Here you can save the bytes read to a persistent buffer,
// or write them to a file.
Console.WriteLine("{0:N0} bytes read", bytesRead);
totalBytesRead += bytesRead;
}
}
Console.WriteLine("total bytes read = {0:N0}", totalBytesRead);
That said, I ran this sample and it downloaded about 6 kilobytes and stopped. I don't know why you're having trouble with timeouts or too much data.
Note that sometimes trying to close the stream before the entire response is read will cause the program to hang. I'm not sure why that happens at all, and I can't explain why it only happens sometimes. But you can solve it by calling request.Abort before closing the stream. That is:
using (var s = postResponse.GetResponseStream())
{
// do stuff here
// abort the request before continuing
postRequest.Abort();
}

Serializing Alternative views For MSMQ

My concept is downloading a image from url and sending the image(Linked Resource)to mail message to MSMQ!, I can sucessfully download the image , but i cannot able to send it to MSMQ, i need to serialize the Alternative Views, which i could not able to do?
Here is the code
MailMessage m = new MailMessage();
string strBody="<h1>This is sample</h1><image src=\"cid:image1\">";
m.Body = strBody;
AlternateView av1 = AlternateView.CreateAlternateViewFromString(strBody, null, MediaTypeNames.Text.Html);
Here I am Downloading the Image from url
Stream DownloadStream = ReturnImage();
LinkedResource lr = new LinkedResource(DownloadStream, MediaTypeNames.Image.Gif);
lr.ContentId = "image1";
av1.LinkedResources.Add(lr);
m.AlternateViews.Add(av);
private Stream ReturnImage()
{
try
{
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(urlForImage);
webRequest.ProtocolVersion = HttpVersion.Version10;
webRequest.KeepAlive = false;
webRequest.Timeout = 1000000000;
webRequest.ReadWriteTimeout = 1000000000;
using (HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse())
{
Stream k = webResponse.GetResponseStream();
MemoryStream ms = new MemoryStream();
int count = 0;
do
{
byte[] buf = new byte[1024];
count = k.Read(buf, 0, 1024);
ms.Write(buf, 0, count);
} while (k.CanRead && count > 0);
return ms;
}
}
catch (WebException e)
{
return null;
}
Can you guys give solution for serializing the Alternative views so that i can able to send and Receive MSMQ !
I do not think that you should take this approach.
MSMQ is designed to be ligtwheight, so sending huge data like images was not the intention - although this is technically possible, of course.
Also bear in mind that MSQM has a limit of 4 MB per message. Depending on the size of your images this might become problematic (or might not).
Instead I suggest that you save the images to a place that can be accessed by all participating application/ services/ etc., e.g. a network share, file server, or web server, or ...
Then you send only the URIs in your MSMQ message. This will be very fast to process on both sender and receiver side. Also, this will be much, much lighter on the MSMQ infrastructure.

HttpWebRequest gets slower when adding an Interval

Testing different possibilities to download the source of a webpage I got the following results (Average time in ms to google.com, 9gag.com):
Plain HttpWebRequest: 169, 360
Gzip HttpWebRequest: 143, 260
WebClient GetStream : 132, 295
WebClient DownloadString: 143, 389
So for my 9gag client I decided to take the gzip HttpWebRequest. The problem is, after implementing in my actual program, the request takes more than twice the time.
The Problem also occurs when just adding a Thread.Sleep between two requests.
EDIT:
Just improved the code a bit, still the same problem: When running in a loop the requests takes longer when I add an Delay between to requests
for(int i = 0; i < 100; i++)
{
getWebsite("http://9gag.com/");
}
Takes about 250ms per request.
for(int i = 0; i < 100; i++)
{
getWebsite("http://9gag.com/");
Thread.Sleep(1000);
}
Takes about 610ms per request.
private string getWebsite(string Url)
{
Stopwatch stopwatch = Stopwatch.StartNew();
HttpWebRequest http = (HttpWebRequest)WebRequest.Create(Url);
http.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
string html = string.Empty;
using (HttpWebResponse webResponse = (HttpWebResponse)http.GetResponse())
using (Stream responseStream = webResponse.GetResponseStream())
using (StreamReader reader = new StreamReader(responseStream))
{
html = reader.ReadToEnd();
}
Debug.WriteLine(stopwatch.ElapsedMilliseconds);
return html;
}
Any ideas to fix this problem?
Maybe give this a try, although it might only help your case of a single request and actually make things worse when doing a multithreaded version.
ServicePointManager.UseNagleAlgorithm = false;
Here's a quote from MSDN docs for the HttpWebRequest Class
Another option that can have an impact on performance is the use of
the UseNagleAlgorithm property. When this property is set to true,
TCP/IP will try to use the TCP Nagle algorithm for HTTP connections.
The Nagle algorithm aggregates data when sending TCP packets. It
accumulates sequences of small messages into larger TCP packets before
the data is sent over the network. Using the Nagle algorithm can
optimize the use of network resources, although in some situations
performance can also be degraded. Generally for constant high-volume
throughput, a performance improvement is realized using the Nagle
algorithm. But for smaller throughput applications, degradation in
performance may be seen.
An application doesn't normally need to change the default value for
the UseNagleAlgorithm property which is set to true. However, if an
application is using low-latency connections, it may help to set this
property to false.
I think you might be leaking resources as you aren't disposing of all of your IDisposable object with each method call.
Give this version and try and see if it gives you a more consistent execution time.
public string getWebsite( string Url )
{
Stopwatch stopwatch = Stopwatch.StartNew();
HttpWebRequest http = (HttpWebRequest) WebRequest.Create( Url );
http.Headers.Add( HttpRequestHeader.AcceptEncoding, "gzip,deflate" );
string html = string.Empty;
using ( HttpWebResponse webResponse = (HttpWebResponse) http.GetResponse() )
{
using ( Stream responseStream = webResponse.GetResponseStream() )
{
Stream decompressedStream = null;
if ( webResponse.ContentEncoding.ToLower().Contains( "gzip" ) )
decompressedStream = new GZipStream( responseStream, CompressionMode.Decompress );
else if ( webResponse.ContentEncoding.ToLower().Contains( "deflate" ) )
decompressedStream = new DeflateStream( responseStream, CompressionMode.Decompress );
if ( decompressedStream != null )
{
using ( StreamReader reader = new StreamReader( decompressedStream, Encoding.Default ) )
{
html = reader.ReadToEnd();
}
decompressedStream.Dispose();
}
}
}
Debug.WriteLine( stopwatch.ElapsedMilliseconds );
return html;
}

Slow performance in reading from stream .NET

I have a monitoring system and I want to save a snapshot from a camera when alarm trigger.
I have tried many methods to do that…and it’s all working fine , stream snapshot from the camera then save it as a jpg in the pc…. picture (jpg format,1280*1024,140KB)..That’s fine
But my problem is in the application performance...
The app need about 20 ~30 seconds to read the steam, that’s not acceptable coz that method will be called every 2 second .I need to know what wrong with that code and how I can get it much faster than that. ?
Many thanks in advance
Code:
string sourceURL = "http://192.168.0.211/cgi-bin/cmd/encoder?SNAPSHOT";
byte[] buffer = new byte[200000];
int read, total = 0;
WebRequest req = (WebRequest)WebRequest.Create(sourceURL);
req.Credentials = new NetworkCredential("admin", "123456");
WebResponse resp = req.GetResponse();
Stream stream = resp.GetResponseStream();
while ((read = stream.Read(buffer, total, 1000)) != 0)
{
total += read;
}
Bitmap bmp = (Bitmap)Bitmap.FromStream(new MemoryStream(buffer, 0,total));
string path = JPGName.Text+".jpg";
bmp.Save(path);
I very much doubt that this code is the cause of the problem, at least for the first method call (but read further below).
Technically, you could produce the Bitmap without saving to a memory buffer first, or if you don't need to display the image as well, you can save the raw data without ever constructing a Bitmap, but that's not going to help in terms of multiple seconds improved performance. Have you checked how long it takes to download the image from that URL using a browser, wget, curl or whatever tool, because I suspect something is going on with the encoding source.
Something you should do is clean up your resources; close the stream properly. This can potentially cause the problem if you call this method regularly, because .NET will only open a few connections to the same host at any one point.
// Make sure the stream gets closed once we're done with it
using (Stream stream = resp.GetResponseStream())
{
// A larger buffer size would be benefitial, but it's not going
// to make a significant difference.
while ((read = stream.Read(buffer, total, 1000)) != 0)
{
total += read;
}
}
I cannot try the network behavior of the WebResponse stream, but you handle the stream twice (once in your loop and once with your memory stream).
I don't thing that's the whole problem but I'd give it a try:
string sourceURL = "http://192.168.0.211/cgi-bin/cmd/encoder?SNAPSHOT";
WebRequest req = (WebRequest)WebRequest.Create(sourceURL);
req.Credentials = new NetworkCredential("admin", "123456");
WebResponse resp = req.GetResponse();
Stream stream = resp.GetResponseStream();
Bitmap bmp = (Bitmap)Bitmap.FromStream(stream);
string path = JPGName.Text + ".jpg";
bmp.Save(path);
Try to read bigger pieces of data, than 1000 bytes per time. I can see no problem with, for example,
read = stream.Read(buffer, 0, buffer.Length);
Try this to download the file.
using(WebClient webClient = new WebClient())
{
webClient.DownloadFile("http://192.168.0.211/cgi-bin/cmd/encoder?SNAPSHOT", "c:\\Temp\myPic.jpg");
}
You can use a DateTime to put a unique stamp on the shot.

Categories