How to get HTML content of webpage from ASP.NET - c#

I would like to scrape some contents from a dynamic web page (seems it is developed in MVC).
Data scraping logics are done with HTML agility, but now the issue is,
HTML returned while requesting for URL from browser and web response of the URL from ASP.NET web request is different.
Mainly browser response has dynamic data I need (renders based on the value passed in query string), but the WebResponse result is different.
Could you please help me to get the actual content of the dynamic web page view WebRequest.
Below is the code I used to read:
WebRequest request = WebRequest.Create(sURL);
request.Method = "Get";
//Get the response
WebResponse response = request.GetResponse();
//Read the stream from the response
StreamReader reader = new StreamReader(response.GetResponseStream(), System.Text.Encoding.UTF8);

To get the content of any web page using HttpWebRequest...
// We will store the html response of the request here
string siteContent = string.Empty;
// The url you want to grab
string url = "http://google.com";
// Here we're creating our request, we haven't actually sent the request to the site yet...
// we're simply building our HTTP request to shoot off to google...
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
request.AutomaticDecompression = DecompressionMethods.GZip;
// Right now... this is what our HTTP Request has been built in to...
/*
GET http://google.com/ HTTP/1.1
Host: google.com
Accept-Encoding: gzip
Connection: Keep-Alive
*/
// Wrap everything that can be disposed in using blocks...
// They dispose of objects and prevent them from lying around in memory...
using(HttpWebResponse response = (HttpWebResponse)request.GetResponse()) // Go query google
using(Stream responseStream = response.GetResponseStream()) // Load the response stream
using(StreamReader streamReader = new StreamReader(responseStream)) // Load the stream reader to read the response
{
siteContent = streamReader.ReadToEnd(); // Read the entire response and store it in the siteContent variable
}
// magic...
Console.WriteLine (siteContent);

Related

How To Get Raw HttpWebRequest (c#)

We have a program that has been running for years making API calls to a web server using HttpWebRequest and yesterday it started giving an error (something like "connection forcibly closed by remote host"). The request works just fine when made through a web browser so I would love to be able to see the difference in requests. With the Firefox developer console, I can see the raw request that is made through the browser (that works) and I need to compare that to the http request that is made from our program. It seems like it should be simple (and very useful) to stream the request out to a string or a file so I can look at it (but I have not had any luck finding how to do that).
Can you tell me how to modify the below code to store the request that HttpWebRequest would send to a file or a string (instead of a network stream)?
public string Post(string uri, string data, string contentType, string method = "POST")
{
byte[] dataBytes = Encoding.UTF8.GetBytes(data);
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
request.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
request.ContentLength = dataBytes.Length;
request.ContentType = contentType;
request.Method = method;
using(Stream requestBody = request.GetRequestStream())
{
requestBody.Write(dataBytes, 0, dataBytes.Length);
}
using(HttpWebResponse response = (HttpWebResponse)request.GetResponse())
using(Stream stream = response.GetResponseStream())
using(StreamReader reader = new StreamReader(stream))
{
return reader.ReadToEnd();
}
}
You can't simply log the httprequest. Because you should also consider logging all the headers.
I suggest you to use some http sniffer to log the traffic(if you can't debug or modify your code)
In addition you can catch the exceptions by using WebException and get the raw error message from the server. Maybe it'll give you idea what the problem is.
catch (WebException ex)
{
using (var stream = ex.Response.GetResponseStream())
using (var reader = new StreamReader(stream))
{
var responseString = reader.ReadToEnd();
}
}

Get auth token using post request

I am trying to get request auth token by making a post web request to a url. The api expects username/password as credentials in the form-data payload.
When I click the sign-in option on the browser, the network logs show a GET request with HTML as response, followed by a POST request which returns form-data with username/password and request token in payload.
Trying to mock the flow using webrequest, I am doing a simple post request, as the following:
public string HttpPost(string url, string post, string refer = "")
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
// request.CookieContainer = cJar;
request.UserAgent = UserAgent;
request.KeepAlive = false;
request.Method = "POST";
request.Referer = refer;
byte[] postBytes = Encoding.ASCII.GetBytes(post);
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = postBytes.Length;
Stream requestStream = request.GetRequestStream();
requestStream.Write(postBytes, 0, postBytes.Length);
requestStream.Close();
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader sr = new StreamReader(response.GetResponseStream());
return sr.ReadToEnd();
}
However, this request only returns the text/HTML markup of the page as the first part of the request of the browser does. How do I get it to run the subsequent POST to fetch the token from the endpoint?
EDIT 1:
Here is the first GET Request:
The token is a CSRF token, what you need to do is find the login form in the html response that you've received with your initial get request, and also to ensure you are storing the cookies set in this response.
You will then need to search within the html response for the hidden input parameter named 'token' next to the username and pw input fields and use the value of that element to compose your post request.
Doing this programmatically is possible with some regex or the htmlagilitypack to extract that token

Send file with curl command in C#

I have to send a file since a cURL command :
curl -X POST -F "csv[file]=#/mypath.csv" https://mylogin:mypassword#the-server.net
Si i tried with an HttpClient :
var client = new HttpClient();
// Create the HttpContent for the form to be posted.
var requestContent = new FormUrlEncodedContent(new[] { new KeyValuePair<string, string>("csv[file]", $#"#/{this.pathFile}")});
// Get the response.
HttpResponseMessage response = await client.PostAsync($#"https://{this.login}:{this.password}#myserver.net",requestContent);
// Get the response content.
HttpContent responseContent = response.Content;
// Get the stream of the content.
using (var reader = new StreamReader(await responseContent.ReadAsStreamAsync()))
{
// Write the output.
var testResult= await reader.ReadToEndAsync();
}
Or with the following code :
WebRequest request = WebRequest.Create($#"https://{this.login}:{this.password}#myserver.net");
// Set the Network credentials
request.Credentials = CredentialCache.DefaultCredentials;
request.Method = "POST";
// Create POST data and convert it to a byte array.
string postData = $#"csv[file]=#/{this.pathFile}";
byte[] byteArray = Encoding.UTF8.GetBytes(postData);
request.ContentType = "application/x-www-form-urlencoded";
// Set the ContentLength property of the WebRequest.
request.ContentLength = byteArray.Length;
using (Stream dataStream = request.GetRequestStream())
{
// Write the data to the request stream.
dataStream.Write(byteArray, 0, byteArray.Length);
}
using (WebResponse response2 = request.GetResponse())
{
// Display the status.
Console.WriteLine(((HttpWebResponse)response2).StatusDescription);
// Get the stream containing content returned by the server.
using (StreamReader reader = new StreamReader(response2.GetResponseStream()))
{
Console.WriteLine(reader.ReadToEnd());
}
}
But each time, it's the result 401 Unauthorized. Of course, my credentials are the good ones...
[EDIT]
I work for a professional project. the server where to send the file belongs to a partner. The cURL command is imposed on me and i haven't got control of this server
[EDIT 2]
I did an analysis with wireshark
IP 229 is the partners server
IP 160 is my compuer
I just have an encrypted alert. I tested with HTTP and not https but i have the same message
It Looks like that the file you are accessing is placed at a point where the user did not have any right to access the file, Try to give permission to the folder of read and write access to use the file.
That could solve your issue.

Get full url from shorten url in C#.net

I am developing one application where I need capture basic detail like title, description and images of website based on url provided by user.
But user may be enter www.google.com insted of http://www.google.com but C#.net code failed to retrieve data for "www.google.com" through below code
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(new Uri(url));
request.Method = WebRequestMethods.Http.Get;
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
StreamReader reader = new StreamReader(response.GetResponseStream());
String responseString = reader.ReadToEnd();
response.Close();
and found error like "Invalid URI: The format of the URI could not be determined."
So do know any technique to found full url based on shorten url.
for ex. google.com or www.google.com
Expected output : http://www.google.com or https://www.google.com
PS : I found online web tool (http://urlex.org/) that will return full url based on shorten url
Thanks in advance.
You can use UriBuilder to create a URL with HTTP as default scheme:
UriBuilder urb = new UriBuilder("www.google.com");
Uri uri = urb.Uri;
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(uri);
request.Method = WebRequestMethods.Http.Get;
string responseString;
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
using (StreamReader reader = new StreamReader(response.GetResponseStream()))
{
responseString = reader.ReadToEnd();
}
}
If your URL contains a scheme, it will use that one instead of the default HTTP scheme. I have also used using to release all unmanaged resources.
So do know any technique to found full url based on shorten url.
I may have misunderstood your issue here but can't you just append "http://" if it's missing?
string url = "www.google.com";
if (!url.StartsWith("http"))
url = $"http://{url}";
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create(new Uri(url));
request.Method = WebRequestMethods.Http.Get;
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
using (StreamReader reader = new StreamReader(response.GetResponseStream()))
{
String responseString = reader.ReadToEnd();
}
This is basically what a web browser does when you don't specify any protocol.

C# HttpWebRequest.Write does not send content

Trying to send a web request with some body content. The important part is that I need some data in the body of the post request. My understanding of how to do this is to open a WebRequestStream, and then write the bytes to it, then to close it. This is supposed to be simple. Here is my code:
HttpWebRequest request;
request = (HttpWebRequest)WebRequest.Create("http://localhost:50203/api/Values");//
request.Method = "POST";
byte[] requestBody = ASCIIEncoding.ASCII.GetBytes(HttpUtility.UrlEncode("grant_type=client_credentials"));
Stream requestBodyStream = request.GetRequestStream();
requestBodyStream.Write(requestBody, 0, requestBody.Length);
requestBodyStream.Flush();
requestBodyStream.Close();
WebResponse response = (HttpWebResponse)request.GetResponse();
Stream dataStream = response.GetResponseStream();
StreamReader reader = new StreamReader(dataStream);
myString = reader.ReadToEnd();
But the RequestBodyStream.Write method is not sending anything in the body. I know this because I'm running the server side program at the other end.
I also tried to do this with a StreamWriter instead of using a byte stream, and I get the same result. No matter how I do it, there is no content in the body.
My understanding is that closing the stream is what sends the actual data. I also tried adding a Flush() method to the stream.
Why is this method not producing any body?
Add 'ContentType' and 'ContentLength' headers to the request instance:
request.ContentType = "application/json"; // Or whatever you want
request.ContentLength = requestBody.Length;

Categories