How can I download HTML source in C# - c#

How can I get the HTML source for a given web address in C#?

You can download files with the WebClient class:
using System.Net;
using (WebClient client = new WebClient ()) // WebClient class inherits IDisposable
{
client.DownloadFile("http://yoursite.com/page.html", #"C:\localfile.html");
// Or you can get the file content without saving it
string htmlCode = client.DownloadString("http://yoursite.com/page.html");
}

Basically:
using System.Net;
using System.Net.Http; // in LINQPad, also add a reference to System.Net.Http.dll
WebRequest req = HttpWebRequest.Create("http://google.com");
req.Method = "GET";
string source;
using (StreamReader reader = new StreamReader(req.GetResponse().GetResponseStream()))
{
source = reader.ReadToEnd();
}
Console.WriteLine(source);

The newest, most recent, up to date answer
This post is really old (it's 7 years old when I answered it), so no one of the other answers used the new and recommended way, which is HttpClient class.
HttpClient is considered the new API and it should replace the old ones (WebClient and WebRequest)
string url = "page url";
HttpClient client = new HttpClient();
using (HttpResponseMessage response = client.GetAsync(url).Result)
{
using (HttpContent content = response.Content)
{
string pageContent = content.ReadAsStringAsync().Result;
}
}
for more information about how to use the HttpClient class (especially in async cases), you can refer this question
NOTE 1: If you want to use async/await
string url = "page url";
HttpClient client = new HttpClient(); // actually only one object should be created by Application
using (HttpResponseMessage response = await client.GetAsync(url))
{
using (HttpContent content = response.Content)
{
string pageContent = await content.ReadAsStringAsync();
}
}
NOTE 2: If use C# 8 features
string url = "page url";
HttpClient client = new HttpClient();
using HttpResponseMessage response = await client.GetAsync(url);
using HttpContent content = response.Content;
string pageContent = await content.ReadAsStringAsync();

You can get the HTML source with:
var html = new System.Net.WebClient().DownloadString(siteUrl)

#cms way is the more recent, suggested in MS website, but I had a hard problem to solve, with both method posted here, now I post the solution for all!
problem:
if you use an url like this: www.somesite.it/?p=1500 in some case you get an internal server error (500),
although in web browser this www.somesite.it/?p=1500 perfectly work.
solution:
you have to move out parameters, working code is:
using System.Net;
//...
using (WebClient client = new WebClient ())
{
client.QueryString.Add("p", "1500"); //add parameters
string htmlCode = client.DownloadString("www.somesite.it");
//...
}
here official documentation

Related

How to post data with C# in mono?

1)
WebRequest request = WebRequest.Create("");
I can not use this method.
The reason is my mono can not load the System.Net.Configuration.WebRequsetModulesSection.
2)
Navigate(bstrURL, &vFlags, &vTargetFrameName, &vPostData, &vHeaders);
I can not use this method.
The reason is can not use the namespace using System.Windows.Forms;
What else can I use to post data to the URL.
You can use HttpClient:
var httpClient = new HttpClient();
var content = new FormUrlEncodedContent(new KeyValuePair<string, string>[0]);
var response = await httpClient.PostAsync("URL", content);
string responseAsString = await response.Content.ReadAsStringAsync();
Console.WriteLine(responseAsString);

How to serialize-deserialize object using HTTP GetAsync method?

After I upgraded the framework of web app from 4.0 to 4.6 I found that there is no more ReadAsAsync() method in HTTP protocol library, instead of ReadAsAsync() there is GetAsync(). I need to serialize my custom object using GetAsync().
The code using ReadAsAsync():
CustomResponse customResponse = client.ReadAsAsync("api/xxx", new StringContent(new JavaScriptSerializer().Serialize(request), Encoding.UTF8, "application/json")).Result;
Another example based on ReadAsAsync()
CustomResponse customResponse = await Response.Content.ReadAsAsync<CustomResponse>();
How to achieve same goal using GetAsync() method ?
You can use it this way:
(you might want to run it on another thread to avoid waiting for response)
using (HttpClient client = new HttpClient())
{
using (HttpResponseMessage response = await client.GetAsync(page))
{
using (HttpContent content = response.Content)
{
string contentString = await content.ReadAsStringAsync();
var myParsedObject = (MyObject)(new JavaScriptSerializer()).Deserialize(contentString ,typeof(MyObject));
}
}
}

HttpPost request not working when JSON string has HTML in

If this is a duplicate of any existing question, please let me know which post has a similar situation.
I am trying to call a POST API, which actually works perfectly from REST clients like POSTMAN.
When I try to call that API from C# using HttpClient, it only works if I do not use any HTML content in the request body.
Here is my code:
HttpClient client = new HttpClient();
string baseUrl = channel.DomainName;
client.BaseAddress = new Uri(baseUrl);
client.DefaultRequestHeaders
.TryAddWithoutValidation(
"Content-Type",
"application/x-www-form-urlencoded;charset=utf-8");
const string serviceUrl = "/api/create";
var jsonString = CreateApiRequestBody(model, userId, false);
var uri = new Uri(baseUrl + serviceUrl);
try
{
HttpResponseMessage response = await client.PostAsync(uri.ToString(), new StringContent(jsonString, Encoding.UTF8, "application/json"));
if (response.IsSuccessStatusCode)
{
Stream receiveStream = response.Content.ReadAsStreamAsync().Result;
StreamReader readStream = new StreamReader(receiveStream, Encoding.UTF8);
var str = readStream.ReadToEnd();
}
...
}
And my jsonString looks like:
{
\"user_id\":\"6\",
\"description\":\"<h2 style=\\\"font-style:italic;\\\"><u><font><font>Test Test Test </font></font></u></h2>\\n\\n<p style=\\\"font-style: italic;\\\">Hi it's a Test JOB</p>\\n\\n<p> </p>\"
}
When I use plain text in description tag, the API returns a valid response, but not with the HTML content in it.
I believe I might be missing some extra header or something else.
Any help will be greatly appreciated.
Have you tried using WebUtility.HtmlEncode() method?
Where you're setting the StringContent content, try using WebUtility.HtmlEncode(jsonString) to make it API-friendly.
Like this:
using System.Net;
HttpResponseMessage response =
await client.PostAsync(
uri.ToString(),
new StringContent(WebUtility.HtmlEncode(jsonString),
Encoding.UTF8,
"application/json"));
Don't forget to use System.Net
That will give you a safe (especially for APIs) HTML string to use in your request.
Hope this helps.

SOAP Request with HttpClient

I'am trying to reach a SOAP API using the HttpClient object. I've searched everywhere but most of the people are using the HttpWebRequest object which is not supported by the DNX Core framework.
Does anyone have a working example of a SOAP request using the HttpClient object?
This image represents a simple request from this API (NuSOAP PHP):
Thank you!
EDIT :
So I was able to call the API with the following code:
Uri uri = new Uri("http://localhost/teek_api/service.php");
HttpClient hc = new HttpClient();
hc.DefaultRequestHeaders.Add("SOAPAction", "http://localhost/teek_api/service.php/ping");
var content = new StringContent("text/xml; charset=utf-8");
using (HttpResponseMessage response = await hc.PostAsync(uri, content))
{
var soapResponse = await response.Content.ReadAsStringAsync();
string value = await response.Content.ReadAsStringAsync();
return value;
}

Windows 8 C# - retrieve a webpage source as string

there's a tutorial that actually works for Windows 8 platform with XAML and C#: http://www.tech-recipes.com/rx/1954/get_web_page_contents_in_code_with_csharp/
Here's how:
HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(URL);
myRequest.Method = "GET";
WebResponse myResponse = myRequest.GetResponse();
StreamReader sr = new StreamReader(myResponse.GetResponseStream(), System.Text.Encoding.UTF8);
string result = sr.ReadToEnd();
sr.Close();
myResponse.Close();
However in Windows 8, the last 2 lines which are code to close the connection (I assume), detected error. It works fine without closing the connection, though, but what are the odds? Why do we have to close the connection? What could go wrong if I don't? What do "closing connection" even mean?
If you are developing for Windows 8, you should consider using asynchronous methods to provide for a better user experience and it is the recommend new standard. Your code would then look like:
public async Task<string> MakeWebRequest(string url)
{
HttpClient http = new System.Net.Http.HttpClient();
HttpResponseMessage response = await http.GetAsync(url);
return await response.Content.ReadAsStringAsync();
}
Maybe they've deprecated close() in the latest API. This should work:
HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create(URL);
myRequest.Method = "GET";
using(WebResponse myResponse = myRequest.GetResponse() )
{
using(StreamReader sr = new StreamReader(myResponse.GetResponseStream(), System.Text.Encoding.UTF8))
{
string result = sr.ReadToEnd();
}
}
The using command will automatically dispose your objects.
To highlight webnoob's comment:
Just to point out (for OP reference) you can only use using on classes that implement IDisposable (which in this case is fine)
using System.Net;
using System.Net.Http;
var httpClient = new HttpClient();
var message = new HttpRequestMessage(HttpMethod.Get, targetURL);
//message.Headers.Add(....);
//message.Headers.Add(....);
var response = await httpClient.SendAsync(message);
if (response.StatusCode == HttpStatusCode.OK)
{
//HTTP 200 OK
var requestResultString = await response.Content.ReadAsStringAsync();
}
I would recommend using the HTTP Client
s. Microsoft HTTP Client Example

Categories