How can i get the html of the page? [duplicate]

How can i get the html of the page? [duplicate] - c#

This question already has answers here:
How to get Url Hash (#) from server side
(6 answers)
Closed 9 years ago.
Here is the code I use to perform a web request. I'm getting all of the HTML except for the comments section in the URL.
HttpWebRequest req = (HttpWebRequest) HttpWebRequest.Create(
"http://u-handbag.typepad.com/uhandblog/2013/11/choosing-bag-fabrics.html#comment-6a00d8341c574653ef019b022fc96f970d"
);
StreamReader reader = new StreamReader(req.GetResponse().GetResponseStream());
htl = reader.ReadToEnd();
Can anyone explain why?

Use this chunk of code. Variable result should have the html code.
System.Net.WebClient webClient = new System.Net.WebClient();
string result = webClient.DownloadString(URL);

Getting HTML code from a website page. You can use code like this.
string urlAddress = "http://u-handbag.typepad.com/uhandblog/2013/11/choosing-bag-fabrics.html#comment-6a00d8341c574653ef019b022fc96f970d";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
Stream receiveStream = response.GetResponseStream();
StreamReader readStream = null;
if (response.CharacterSet == null)
readStream = new StreamReader(receiveStream);
else
readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
string data = readStream.ReadToEnd();
response.Close();
readStream.Close();
}
or better to use WebClient
using System.Net;
using (WebClient client = new WebClient())
{
string htmlCode = client.DownloadString("http://u-handbag.typepad.com/uhandblog/2013/11/choosing-bag-fabrics.html#comment-6a00d8341c574653ef019b022fc96f970d");
}

Related

Downloading HTML page in C# and loading it in memory fastest way

I have written a following code which downloads a page from a given URL:
string html = string.Empty;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.AutomaticDecompression = DecompressionMethods.GZip;
request.Proxy = null;
request.ServicePoint.UseNagleAlgorithm = false;
request.ServicePoint.Expect100Continue = false;
request.Method = "GET";
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
using (Stream stream = response.GetResponseStream())
using (StreamReader reader = new StreamReader(stream))
{
html = reader.ReadToEnd();
}
But this takes about 5-8 seconds to download the HTML file which is quite quite slow. My question here is, is there any way to improve this code, or use some other piece of code/library that can perform the HTML download for a given URL faster than this one?
Can someone help me out ?

Why not use an httpclient then write the result to a file that way?
using (HttpClient client = new HttpClient())
{
using (HttpRequestMessage request = new HttpRequestMessage())
{
request.Method = HttpMethod.Get;
request.RequestUri = new Uri("http://www.google.com", UriKind.Absolute);
using (HttpResponseMessage response = await client.SendAsync(request))
{
if (response.IsSuccessStatusCode)
{
if (response.Content != null)
{
var result = await response.Content.ReadAsStringAsync();
// write result to file
}
}
}
}
}

Downloading html source having SWF content using httpwebrequest or webclient

while downloading the html source code of http://www.orientalcuisines.in/ this site, In result I am getting only script data not whole html content,
I tried using webclient as well as httpwebrequest.
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://" + website);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
Stream receiveStream = response.GetResponseStream();
StreamReader readStream = null;
if (response.CharacterSet == null)
{
readStream = new StreamReader(receiveStream);
}
else
{
readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
}
string data = readStream.ReadToEnd();
response.Close();
readStream.Close();
}
OR
using (WebClient client = new WebClient())
{
client.DownloadFile(website);
}

Getting content from httpwebresponse exception

My remote server is throwing a web exception as bad request. But I know there is more information included in the error than what I get. If I look at the details from the exception it does not list the actual content of the response. I can see the content-type, content-length and content-encoding only. If I run this same message through another library (such as restsharp) I will see detailed exception information from the remote server. How can I get more details from the response since I know the remote server is sending them?
static string getXMLString(string xmlContent, string url)
{
//string Url;
string sResult;
//Url = ConfigurationManager.AppSettings["UserURl"] + url;
var httpWebRequest = (HttpWebRequest)WebRequest.Create(url);
httpWebRequest.ContentType = "application/xml";
httpWebRequest.Method = "POST";
using (var streamWriter = new StreamWriter(httpWebRequest.GetRequestStream()))
{
streamWriter.Write(xmlContent);
streamWriter.Flush();
streamWriter.Close();
var httpResponse = (HttpWebResponse)httpWebRequest.GetResponse();
using (var streamReader = new StreamReader(httpResponse.GetResponseStream()))
{
var result = streamReader.ReadToEnd();
sResult = result;
}
}
return sResult;
}

EDIT : Have you tried with a simple try-catch to see if you can get more details ?
try
{
var response = (HttpWebResponse)(request.GetResponse());
}
catch(Exception ex)
{
var response = (HttpWebResponse)ex.Response;
}
In my recherches in an answer for you, I noticed that in code there was something about encoding, that you didn't specified. Look here for exemple with such code.
var encoding = ASCIIEncoding.ASCII;
using (var reader = new System.IO.StreamReader(response.GetResponseStream(), encoding))
{
string responseText = reader.ReadToEnd();
}
Or here, in the doc, also.
// Creates an HttpWebRequest with the specified URL.
HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(url);
// Sends the HttpWebRequest and waits for the response.
HttpWebResponse myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();
// Gets the stream associated with the response.
Stream receiveStream = myHttpWebResponse.GetResponseStream();
Encoding encode = System.Text.Encoding.GetEncoding("utf-8");
// Pipes the stream to a higher level stream reader with the required encoding format.
StreamReader readStream = new StreamReader( receiveStream, encode );
Console.WriteLine("\r\nResponse stream received.");
Have you tried with such ?

Process URL and Get data

I have a URL which returns array of data.
For example if I have the URL http://test.com?id=1 it will return values like 3,5,6,7 etc…
Is there any way that I can process this URL and get the returned values without going to browser (to process the url within the application)?
Thanks

Really easy:
using System.Net;
...
var response = new WebClient().DownloadString("http://test.com?id=1");

string urlAddress = "YOUR URL";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(urlAddress);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode == HttpStatusCode.OK)
{
Stream receiveStream = response.GetResponseStream();
StreamReader readStream = null;
if (response.CharacterSet == null)
readStream = new StreamReader(receiveStream);
else
readStream = new StreamReader(receiveStream, Encoding.GetEncoding(response.CharacterSet));
string data = readStream.ReadToEnd();
response.Close();
readStream.Close();
}
This should do the trick.
It returns the html source code of the homepage and fills it into the string data.
You can use that string now.
Source: http://www.codeproject.com/Questions/204778/Get-HTML-code-from-a-website-C

This is a simple function I constantly use for similar purposes (VB.NET):
Public Shared Function GetWebData(url As String) As String
Try
Dim request As WebRequest = WebRequest.Create(url)
Dim response As HttpWebResponse = CType(request.GetResponse(), HttpWebResponse)
Dim dataStream As Stream = response.GetResponseStream()
Dim readStream As StreamReader = New StreamReader(dataStream)
Dim data = readStream.ReadToEnd()
readStream.Close()
dataStream.Close()
response.Close()
Return data
Catch ex As Exception
Return ""
End Try
End Function
To use it, pass it the URL and it will return the contents of the URL.

Read text from response

HttpWebRequest request = WebRequest.Create("http://google.com") as HttpWebRequest;
request.Accept = "application/xrds+xml";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
WebHeaderCollection header = response.Headers;
Here google returns text. How to read it?

Your "application/xrds+xml" was giving me issues, I was receiving a Content-Length of 0 (no response).
After removing that, you can access the response using response.GetResponseStream().
HttpWebRequest request = WebRequest.Create("http://google.com") as HttpWebRequest;
//request.Accept = "application/xrds+xml";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
WebHeaderCollection header = response.Headers;
var encoding = ASCIIEncoding.ASCII;
using (var reader = new System.IO.StreamReader(response.GetResponseStream(), encoding))
{
string responseText = reader.ReadToEnd();
}

The accepted answer does not correctly dispose the WebResponse or decode the text. Also, there's a new way to do this in .NET 4.5.
To perform an HTTP GET and read the response text, do the following.
.NET 1.1 ‒ 4.0
public static string GetResponseText(string address)
{
var request = (HttpWebRequest)WebRequest.Create(address);
using (var response = (HttpWebResponse)request.GetResponse())
{
var encoding = Encoding.GetEncoding(response.CharacterSet);
using (var responseStream = response.GetResponseStream())
using (var reader = new StreamReader(responseStream, encoding))
return reader.ReadToEnd();
}
}
.NET 4.5
private static readonly HttpClient httpClient = new HttpClient();
public static async Task<string> GetResponseText(string address)
{
return await httpClient.GetStringAsync(address);
}

I've just tried that myself, and it gave me a 200 OK response, but no content - the content length was 0. Are you sure it's giving you content? Anyway, I'll assume that you've really got content.
Getting actual text back relies on knowing the encoding, which can be tricky. It should be in the Content-Type header, but then you've got to parse it etc.
However, if this is actually XML (e.g. from "http://google.com/xrds/xrds.xml"), it's a lot easier. Just load the XML into memory, e.g. via LINQ to XML. For example:
using System;
using System.IO;
using System.Net;
using System.Xml.Linq;
using System.Web;
class Test
{
static void Main()
{
string url = "http://google.com/xrds/xrds.xml";
HttpWebRequest request = (HttpWebRequest) WebRequest.Create(url);
XDocument doc;
using (WebResponse response = request.GetResponse())
{
using (Stream stream = response.GetResponseStream())
{
doc = XDocument.Load(stream);
}
}
// Now do whatever you want with doc here
Console.WriteLine(doc);
}
}
If the content is XML, getting the result into an XML object model (whether it's XDocument, XmlDocument or XmlReader) is likely to be more valuable than having the plain text.

This article gives a good overview of using the HttpWebResponse object:How to use HttpWebResponse
Relevant bits below:
HttpWebResponse webresponse;
webresponse = (HttpWebResponse)webrequest.GetResponse();
Encoding enc = System.Text.Encoding.GetEncoding(1252);
StreamReader loResponseStream = new StreamReader(webresponse.GetResponseStream(),enc);
string Response = loResponseStream.ReadToEnd();
loResponseStream.Close();
webresponse.Close();
return Response;

HttpWebRequest request = (HttpWebRequest)WebRequest.Create("https://www.google.com");
request.Method = "GET";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
Stream dataStream = response.GetResponseStream();
StreamReader reader = new StreamReader(dataStream);
string strResponse = reader.ReadToEnd();

response.GetResponseStream() should be used to return the response stream. And don't forget to close the Stream and Response objects.

If you http request is Post and request.Accept = "application/x-www-form-urlencoded";
then i think you can to get text of respone by code bellow:
var contentEncoding = response.Headers["content-encoding"];
if (contentEncoding != null && contentEncoding.Contains("gzip")) // cause httphandler only request gzip
{
// using gzip stream reader
using (var responseStreamReader = new StreamReader(new GZipStream(response.GetResponseStream(), CompressionMode.Decompress)))
{
strResponse = responseStreamReader.ReadToEnd();
}
}
else
{
// using ordinary stream reader
using (var responseStreamReader = new StreamReader(response.GetResponseStream()))
{
strResponse = responseStreamReader.ReadToEnd();
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How can i get the html of the page? [duplicate] - c#

Use this chunk of code. Variable result should have the html code. System.Net.WebClient webClient = new System.Net.WebClient(); string result = webClient.DownloadString(URL);

Related

Downloading HTML page in C# and loading it in memory fastest way

Downloading html source having SWF content using httpwebrequest or webclient

Getting content from httpwebresponse exception

Process URL and Get data

Read text from response

Categories

Resources