I have a project at work the requires me to be able to enter information into a web page, read the next page I get redirected to and then take further action. A simplified real-world example would be something like going to google.com, entering "Coding tricks" as search criteria, and reading the resulting page.
Small coding examples like the ones linked to at http://www.csharp-station.com/HowTo/HttpWebFetch.aspx tell how to read a web page, but not how to interact with it by submitting information into a form and continuing on to the next page.
For the record, I'm not building a malicious and/or spam related product.
So how do I go read web pages that require a few steps of normal browsing to reach first?
You can programmatically create an Http request and retrieve the response:
string uri = "http://www.google.com/search";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
// encode the data to POST:
string postData = "q=searchterm&hl=en";
byte[] encodedData = new ASCIIEncoding().GetBytes(postData);
request.ContentLength = encodedData.Length;
Stream requestStream = request.GetRequestStream();
requestStream.Write(encodedData, 0, encodedData.Length);
// send the request and get the response
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
// Do something with the response stream. As an example, we'll
// stream the response to the console via a 256 character buffer
using (StreamReader reader = new StreamReader(response.GetResponseStream()))
{
Char[] buffer = new Char[256];
int count = reader.Read(buffer, 0, 256);
while (count > 0)
{
Console.WriteLine(new String(buffer, 0, count));
count = reader.Read(buffer, 0, 256);
}
} // reader is disposed here
} // response is disposed here
Of course, this code will return an error since Google uses GET, not POST, for search queries.
This method will work if you are dealing with specific web pages, as the URLs and POST data are all basically hard-coded. If you needed something that was a little more dynamic, you'd have to:
Capture the page
Strip out the form
Create a POST string based on the form fields
FWIW, I think something like Perl or Python might be better suited to that sort of task.
edit: x-www-form-urlencoded
You might try Selenium. Record the actions in Firefox using Selenium IDE, save the script in C# format, then play them back using the Selenium RC C# wrapper. As others have mentioned you could also use System.Net.HttpWebRequest or System.Net.WebClient. If this is a desktop application see also System.Windows.Forms.WebBrowser.
Addendum: Similar to Selenium IDE and Selenium RC, which are Java-based, WatiN Test Recorder and WatiN are .NET-based.
What you need to do is keep retrieving and analyzing the html source for each page in the chain. For each page, you need to figure out what the form submission will look like and send a request that will match that to get the next page in the chain.
What I do is build a custom class the wraps System.Net.HttpWebRequest/HttpWebResponse, so retrieving pages is as simple as using System.Net.WebClient. However, my custom class also keeps the same cookie container across requests and makes it a little easier to send post data, customize the user agent, etc.
Depending on how the website works you can either manipulate the url to perform what you want. e.g to search for the word "beatles" you could just open a request to google.com?q=beetles and then just read the results.
Alternatively if the website does not use querystring values (url) to process page actions then you will need to work on a webrequest which posts the required values to the website instead. Search in Google for working with WebRequest and webresponse.
Related
I am trying to post data from one page (aspx) to another (mvc). The way I am trying to go about it is in the code behind (aspx.cs) but so far the only way I have found to successfully redirect is to use:
Response.Redirect
which I can't use this because it apparently can only use GET (which won't work because the data I am wanting to transfer can potentially have too many characters for GET to handle). The other stipulation is that I can't use a session variable (the person I'm working with refuses to use session data).
I've looked into things like:
var tdd = new TempDataDictionary();
tdd.Add("subject",subject);
and
Context.Items["subject"] = subject;
but I'm not sure how I'd read that data in the other mvc controller Index method.
Use...
var data = //byte[] containing your data to post
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
request.Method = "POST";
using (Stream requestStream = request.GetRequestStream())
{
request.ContentLength = data.Length;
requestStream.Write(data, 0, data.Length);
}
This is assuming you want to programmatically POST from the code behind.
Session wouldn't work going from one web app to another. It only works from one page to another in the same site.
Hello i making a simple httpwebrequest and then i read (StreamReader) the response and just want to get the html page of website,but i get only one laber(only one element of the page) in the browser all fine(i see all page) but when i try to set cookies to Deny\disable i also in the browser get this label(only one element of the page) and all is disappear.Sow i getting opinion if after i disabled cookies in browser i get the same page(like in code) that mean my HttpWebRequest is have settings cookies=deny/disable.
You can go to https://www.bbvanetcash.com/local_kyop/KYOPSolicitarCredenciales.html and disable cookies with F12 and you will see the difrance and also this page with one label.
Sow this my code any ideas what i need to change here?
HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create("https://www.bbvanetcash.com/local_kyop/KYOPSolicitarCredenciales.html");
HttpWebResponse myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();
Stream streamResponseLogin = myHttpWebResponse.GetResponseStream();
StreamReader streamReadLogin = new StreamReader(streamResponseLogin);
LoginInfo = streamReadLogin.ReadToEnd();
Your code is receiving complete page content, but it cannot receive the dynamic contents. This is happening because the page you are trying to access relies on Cookies for maintaining session as well as JavaScript (it is using jQuery) for loading dynamic contents and providing rich user experience.
To successfully receive the whole page, your code must
support retrieving, storing and sending cookie objects across various HttpRequest and HttpResponse.
be able to execute JavaScript code to load the dynamic contents/markup of the page
To test 'if your code is receiving proper values or not' visit the site Web Sniffer and put your URL there.
As you can try on web-sniffer site, for www.google.com, the response you are getting is a redirect instruction.... that means, even to access the Google's home page, your code must understand HTTP status messages (302 there).
I am trying to load processed webpage into string, but seems like it is loading javascript as well; but I want this to be "the final" result that can he saved to static html file and run offline.
This is what I am doing at this moment
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(textBox9.Text);
IWebProxy theProxy = request.Proxy;
if (theProxy != null)
{
theProxy.Credentials = CredentialCache.DefaultCredentials;
}
request.UseDefaultCredentials = true;
request.Proxy = WebRequest.DefaultWebProxy;
// execute the request
HttpWebResponse response = (HttpWebResponse)
request.GetResponse();
// we will read data via the response stream
Stream resStream = response.GetResponseStream();
Any suggestions?
If I understand your post correctly, you don't want to strip the javascript out of the page, but keep it and make it so that it will execute just as though you had visited the page normally in a browser?
This is kind of a notoriously hard problem for proxies to overcome, and others have done it with varying degrees of success. Javascript that is embedded in the page should run just fine, but you will run into problems running any javascript that is loaded into a page from an external file.
One thing you could try is to rewrite the paths to external javascript libraries to reflect a local path, then grab copies of those javascript files over the network as well and store everything in a mimicked directory structure. Your milage may vary based on how fancy the javascript involved is, e.g. some ajax calls probably won't work no matter what you do.
Similar threads may be:
How do I programmatically send information to a web service in C# with .Net?
Maintain button submit value when using link to submit a form
Looking how to send multiple values with an input type=submit form
How send a form with Javascript when input name is "submit"?
How do you submit a form with javascript with an < input type="button">
...All of the above seem to almost answer the question...but I'm totally mystified. I think something along the lines of "Net.Post(something)" is how it is done...but I am not sure.
I am currently using F#, and I have figured out how to parse links. I also figured out how to catch the various search bars and submission buttons with regexes.
I would like to use my code to search for something on a search engine by specifically:
First, obtaining the HTML
Second, Scraping the HTML for the various buttons, text bars, and links
Third, use some unknown method/device/tool/function to send a search thread to a text box...
Fourth, Simulate an actual mouse click on the submit "button" that appears on the website...
...
Then, upon reciept of the server's response, pull the HTML from the next site.
Here is my link code as it stands:
type Url(x:string)=
member this.Tostring = sprintf "%A" x
member this.Request = System.Net.WebRequest.Create(x)
member this.Response = this.Request.GetResponse()
member this.Stream = this.Response.GetResponseStream()
member this.Reader = new System.IO.StreamReader(this.Stream)
member this.Html = this.Reader.ReadToEnd()
let linkex = "href=\s*\"[^\"h]*(http://[^&\"]*)\""
let getLinks (txt:string) = [for m in Regex.Matches(txt,linkex)
-> m.Groups.Item(1).Value ]
let collectLinks (url:Url) = url.html
|> getLinks
... I know how to grab the search box strings and whatnot...the question is, when I go to, say, google.com and grab the search bar the same way I grab the links, update the value field with my search string, how do I then submit the updated search bar to google's server?
Secondly, how do I do it if I want to update it and then simulate a mouse click?
In other words, I want to interact with websites the way the user interacts with websites.
There are essentially two options - some web pages accept data using HTTP GET (which means that the data is sent as part of the URL) and other use HTTP POST (which means that the data is sent as the body of the request).
If you're using i.e. Google, then you can use HTTP GET and put the query string in the URL. For example, you can just download a web page with the following URL: https://www.google.com/search?q=hello. So, all you need to do is to generate the URL as follows:
let search = sprintf "http://www.google.com/search?q=%s"
If you want to send HTTP POST request from F#, then you need to create a request with body that contains the form values in an encoded form. This can be written as follows:
open System.Text
open System.IO
open System.Net
// URL of a simple page that takes two HTTP POST parameters. See the
// form that submits there: http://www.snee.com/xml/crud/posttest.html
let url = "http://www.snee.com/xml/crud/posttest.cgi"
// Create & configure HTTP web request
let req = HttpWebRequest.Create(url) :?> HttpWebRequest
req.ProtocolVersion <- HttpVersion.Version10
req.Method <- "POST"
// Encode body with POST data as array of bytes
let postBytes = Encoding.ASCII.GetBytes("fname=Tomas&lname=Petricek")
req.ContentType <- "application/x-www-form-urlencoded";
req.ContentLength <- int64 postBytes.Length
// Write data to the request
let reqStream = req.GetRequestStream()
reqStream.Write(postBytes, 0, postBytes.Length);
reqStream.Close()
// Obtain response and download the resulting page
// (The sample contains the first & last name from POST data)
let resp = req.GetResponse()
let stream = resp.GetResponseStream()
let reader = new StreamReader(stream)
let html = reader.ReadToEnd()
Aside, your use of type with member for every step is a bit weird. Members are re-executed each time you access them, so the code you wrote is pretty non-deterministic. You should use let binding instead.
I am supposed to post some data to a site, using C#. I could post by just using a formular and simple html code. But I do not want any user to be able to look at the source code.
My basic code is:
WebRequest request = WebRequest.Create("https://blabla.bla");
request.ContentType = "application/x-www-form-urlencoded";
byte[] bytes = Encoding.ASCII.GetBytes(parameters);
request.Method = "POST";
try
{
request.ContentLength = bytes.Length;
request.GetRequestStream().Write(bytes, 0, bytes.Length);
}
catch (Exception e)
{
}
finally
{
if (request.GetRequestStream() != null)
{
request.GetRequestStream().Close();
}
}
This posts the data. But how would I do if I want to be transfered to the url and bringing the needed variables? Is it even possible? The site I want to be transfered to is a https.
If you want the user to end up at the page, you'll have to do the post from the client. This means the data has to be on the client. You could get the html of the page like you've done and write that out to the browser, but then if the user clicked anything or did anything with that rendered html, missing sessions/cookies etc could be a mess.
You could have a javascript function in an external minified/obfuscated js file that took in any necessary parameters, such that you build the form and submit it from that javascript function. Yes, a user could still figure out what is happening if they did deep enough, but the client browser has to know the data in order to send it. You have to find the trade-off between security and your userbase and their likelihood to dig into the source code.