.NET HTTPReguest GetResponse() get first N bytes - c#

I'm trying to create a website crawler. it's going to retrieve some data from many websites.
some times if I load just 1000 first bytes of a webpage, I can see what I am looking for.
here is my code:
request = (HttpWebRequest)WebRequest.Create("http://example.com");
var response = (HttpWebResponse)request.GetResponse();
string responseString = new StreamReader(response.GetResponseStream()).ReadToEnd();
when I call request.GetResponse(), it will load whole the page (for example 4000 bytes) but the data that I'm looking for is in the first 1000 bytes. and when I call ReadToEnd(), it will read all the received data from RAM. but whole the data sent to my computer from the website! I don't want to receive all bytes, I need only N bytes of the first.
If I can do it so I save many many internet traffic.
can you help me? how can I do that?

Use StreamReader.Read, eg.
StreamReader sr = new StreamReader(response.GetResponseStream());
char[] c = new char[1000]; // 1000 bytes
sr.Read(c, 0, c.Length);
string responseString = new string(c);

Related

Reading content length of website

I am trying to get the content length of the web page. example http://www.google.com.
I am using c# and below is the code I used and does not give me correct length or does it. Can some one validate please.
var request = (HttpWebRequest)WebRequest.Create("http://www.google.com.au");
request.Method = "GET";
var myResponse = request.GetResponse();
var responseLength = myResponse.ContentLength;
using (var sr = new StreamReader(myResponse.GetResponseStream(), System.Text.Encoding.UTF8))
{
var result = sr.ReadToEnd();
myResponse.Close();
}
responseLength is -1 allways but result.Length has some value, is that correct?
responseLength is -1 allways but result.Length has some value, is that correct?
Well it may be for some web sites (or some responses in some web sites) - in other cases, you'll see a non-negative value for responseLength. All you're doing is fetching the optional Content-Length HTTP header, basically... it's up to the server whether it supplies that or not.
Note that the response length, if provided, will be in bytes - whereas result.Length is in UTF-16 code units. If you want the content length in bytes, you should be reading the binary data from the stream directly rather than creating a StreamReader and reading it as text.
I think you want to DownloadString and then check the length.
Console.WriteLine(new WebClient().DownloadString("https://google.com/").Length);

content from a website in a text file

My aim is to get content from a website (for instance a league table from a sports website) and put it in a .txt file so that I can code with a local file.
I have tried multiple lines of code and others examples such as:
// prepare the web page we will be asking for
HttpWebRequest request = (HttpWebRequest)
WebRequest.Create("http://www.stackoverflow.com");
// prepare the web page we will be asking for
HttpWebRequest request = (HttpWebRequest)
WebRequest.Create("http://www.stackoverflow.com");
// execute the request
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
// we will read data via the response stream
Stream resStream = response.GetResponseStream();
string tempString = null;
int count = 0;
do
{
// fill the buffer with data
count = resStream.Read(buf, 0, buf.Length);
// make sure we read some data
if (count != 0)
{
// translate from bytes to ASCII text
tempString = Encoding.ASCII.GetString(buf, 0, count);
// continue building the string
sb.Append(tempString);
}
while (count > 0); // any more data to read?
}
My issue is when trying this, is that the words request and response are underlined in read and all the tokens are invalid.
Is there a better method to get content from a website to a .txt file or is there a way to fix the code supplied?
Thanks
is there a way to fix the code supplied?
The code you submitted works for me, make sure you have the proper name spaces defined.
In this case : using System.Net;
Or might it be that the duplicate creation of the variable request isn't a typo?
If so remove one of the request variables.
Is there a better method to get content from a website to a .txt file
Since you're reading all the content from the site anyway there isn't really a need for the while loop. Instead you can use the ReadToEnd method supplied by the StreamReader.
string siteContent = "";
using (StreamReader reader = new StreamReader(resStream)) {
siteContent = reader.ReadToEnd();
}
Also be sure to dispose of the WebResponse, other than that your code should work fine.

How to cancel large file download yet still get page source in C#?

I'm working in C# on a program to list all course resources for a MOOC (e.g. Coursera). I don't want to download the content, just get a listing of all the resources (e.g. pdf, videos, text files, sample files, etc...) which are made available to the course.
My problem lies in parsing the html source (currently using HtmlAgilityPack) without downloading all the content.
For example, if you go to this intro video for a banking course on Coursera and check the source (F12 in Chrome for Developer Tools), you can see the page source. I can stop the video download which autoplays, but still see the source.
How can I get the source in C# without download all the content?
I've looked in the HttpWebRequest headers (problem: time out), and DownloadDataAsync with Cancel (problem: the Completed Result object is invalid when cancelling the async request). I've also tried various Loads from HtmlAgilityPack but with no success.
Time out:
HttpWebRequest postRequest = (HttpWebRequest)WebRequest.Create(url);
postRequest.Timeout = TIMEOUT * 1000000; //Really long
postRequest.Referer = "https://www.coursera.org";
if (headers != null)
{ //headers here }
//Deal with cookies
if (cookie != null)
{ cookieJar.Add(cookie); }
postRequest.CookieContainer = cookiejar;
postRequest.Method = "GET";
postRequest.AllowAutoRedirect = allowRedirect;
postRequest.ServicePoint.Expect100Continue = true;
HttpWebResponse postResponse = (HttpWebResponse)postRequest.GetResponse();
Any tips on how to proceed?
There are at least two ways to do what you're asking. The first is to use a range get. That is, specify the range of the file you want to read. You do that by calling AddRange on the HttpWebRequest. So if you want, say, the first 10 kilobytes of the file, you'd write:
request.AddRange(-10240);
Read carefully what the documentation says about the meaning of that parameter. If it's negative, it specifies the ending point of the range. There are also other overloads of AddRange that you might be interested in.
Not all servers support range gets, though. If that doesn't work, you'll have to do it another way.
What you can do is call GetResponse and then start reading data. Once you've read as much data as you want, you can stop reading and close the stream. I've modified your sample slightly to show what I mean.
string url = "https://www.coursera.org/course/money";
HttpWebRequest postRequest = (HttpWebRequest)WebRequest.Create(url);
postRequest.Method = "GET";
postRequest.AllowAutoRedirect = true; //allowRedirect;
postRequest.ServicePoint.Expect100Continue = true;
HttpWebResponse postResponse = (HttpWebResponse) postRequest.GetResponse();
int maxBytes = 1024*1024;
int totalBytesRead = 0;
var buffer = new byte[maxBytes];
using (var s = postResponse.GetResponseStream())
{
int bytesRead;
// read up to `maxBytes` bytes from the response
while (totalBytesRead < maxBytes && (bytesRead = s.Read(buffer, 0, maxBytes)) != 0)
{
// Here you can save the bytes read to a persistent buffer,
// or write them to a file.
Console.WriteLine("{0:N0} bytes read", bytesRead);
totalBytesRead += bytesRead;
}
}
Console.WriteLine("total bytes read = {0:N0}", totalBytesRead);
That said, I ran this sample and it downloaded about 6 kilobytes and stopped. I don't know why you're having trouble with timeouts or too much data.
Note that sometimes trying to close the stream before the entire response is read will cause the program to hang. I'm not sure why that happens at all, and I can't explain why it only happens sometimes. But you can solve it by calling request.Abort before closing the stream. That is:
using (var s = postResponse.GetResponseStream())
{
// do stuff here
// abort the request before continuing
postRequest.Abort();
}

Slow performance in reading from stream .NET

I have a monitoring system and I want to save a snapshot from a camera when alarm trigger.
I have tried many methods to do that…and it’s all working fine , stream snapshot from the camera then save it as a jpg in the pc…. picture (jpg format,1280*1024,140KB)..That’s fine
But my problem is in the application performance...
The app need about 20 ~30 seconds to read the steam, that’s not acceptable coz that method will be called every 2 second .I need to know what wrong with that code and how I can get it much faster than that. ?
Many thanks in advance
Code:
string sourceURL = "http://192.168.0.211/cgi-bin/cmd/encoder?SNAPSHOT";
byte[] buffer = new byte[200000];
int read, total = 0;
WebRequest req = (WebRequest)WebRequest.Create(sourceURL);
req.Credentials = new NetworkCredential("admin", "123456");
WebResponse resp = req.GetResponse();
Stream stream = resp.GetResponseStream();
while ((read = stream.Read(buffer, total, 1000)) != 0)
{
total += read;
}
Bitmap bmp = (Bitmap)Bitmap.FromStream(new MemoryStream(buffer, 0,total));
string path = JPGName.Text+".jpg";
bmp.Save(path);
I very much doubt that this code is the cause of the problem, at least for the first method call (but read further below).
Technically, you could produce the Bitmap without saving to a memory buffer first, or if you don't need to display the image as well, you can save the raw data without ever constructing a Bitmap, but that's not going to help in terms of multiple seconds improved performance. Have you checked how long it takes to download the image from that URL using a browser, wget, curl or whatever tool, because I suspect something is going on with the encoding source.
Something you should do is clean up your resources; close the stream properly. This can potentially cause the problem if you call this method regularly, because .NET will only open a few connections to the same host at any one point.
// Make sure the stream gets closed once we're done with it
using (Stream stream = resp.GetResponseStream())
{
// A larger buffer size would be benefitial, but it's not going
// to make a significant difference.
while ((read = stream.Read(buffer, total, 1000)) != 0)
{
total += read;
}
}
I cannot try the network behavior of the WebResponse stream, but you handle the stream twice (once in your loop and once with your memory stream).
I don't thing that's the whole problem but I'd give it a try:
string sourceURL = "http://192.168.0.211/cgi-bin/cmd/encoder?SNAPSHOT";
WebRequest req = (WebRequest)WebRequest.Create(sourceURL);
req.Credentials = new NetworkCredential("admin", "123456");
WebResponse resp = req.GetResponse();
Stream stream = resp.GetResponseStream();
Bitmap bmp = (Bitmap)Bitmap.FromStream(stream);
string path = JPGName.Text + ".jpg";
bmp.Save(path);
Try to read bigger pieces of data, than 1000 bytes per time. I can see no problem with, for example,
read = stream.Read(buffer, 0, buffer.Length);
Try this to download the file.
using(WebClient webClient = new WebClient())
{
webClient.DownloadFile("http://192.168.0.211/cgi-bin/cmd/encoder?SNAPSHOT", "c:\\Temp\myPic.jpg");
}
You can use a DateTime to put a unique stamp on the shot.

Protocol buffer net with windows phone 7

I'm trying to download a response from a server wich is in Protocol Buffer format from a Windows Phone 7 application.
I'm trying to do this with a WebClient, my problem is the following.
The WebClient has only two method for downloading
DownloadStringAsync(new Uri(url));
and
OpenReadAsync(new Uri(url));
but this two method are not good to retrieve the response because, the response size should have 16 hexadecimal caracteres ( 080118CBDDF0E104 ), but the size of the string and the stream get by the two methods are only 8.
So I'm searching a way to resolve my problem.
I found one for C#
public static T DownloadProto<T>(this WebClient client, string address)
{
byte[] data = client.DownloadData(address);
using (var ms = new MemoryStream(data))
{
return Serializer.Deserialize<T>(ms);
}
}
on
http://code.google.com/p/protobuf-net/source/browse/trunk/BasicHttp/HttpClient/ProtoWebClient.cs?spec=svn340&r=340
But this method was deleted or not yet implemented on the Windows Phone 7 development kit.
How are you reading from the stream?
If you are reading it as a string then its perhaps reading two bytes per character - instead use
var buf = new byte[16];
var actual = stream.Read(buf, 0, buf.Length);

Categories