i want to build a piece of software that will process some html forms, the software will be a kind of bot that will process some forms on my website automatically.
Is there anyone who can give me some basic steps how to do this job...Any tutorials, samples, books or whatever can help me.
Can some of you post an working code with POST method ?
Check out How to: Send Data Using the WebRequest Class. It gives an example of how create a page that posts to another page using the HttpWebRequest class.
To fill out the form...
Find all of the INPUT or TEXTAREA elements that you want to fill out.
Build the data string that you are going to send back to the server. The string is formatted like "name1=value1&name2=value2" (just like in the querystring). Each value will need to be URL encoded.
If the form's "method" attribute is "GET", then take the URL in the "action" attribute, add a "?" and the data string, then make a "GET" web request to the URL.
If the form's "method" is "POST", then the data is submitted in a different area of the web request. Take a look at this page for the C# code.
To expand on David and JP's answers':
Assuming you're working with forms whose contents you're not familiar with, you can probably...
pull the page with the form via an HttpWebRequest.
load it into an XmlDocument
Use XPath to traverse/select the form elements.
Build your query string/post data based on the elements.
Send the data with HttWebRequest
If the form's structure is known in advance, you can really just start at #4.
(untested) example (my XPath is not great so the syntax is almost certainly not quite right):
HttpWebRequest request;
HttpWebResponse response;
XmlDocument xml = new XmlDocument();
string form_url = "http://...."; // you supply this
string form_submit_url;
XmlNodeList element_nodes;
XmlElement form_element;
StringBuilder query_string = new StringBuilder();
// #1
request = (HttpWebRequest)WebRequest.Create(form_url));
response = (HttpWebResponse)request.GetResponse();
// #2
xml.Load(response.GetResponseStream());
// #3a
form_element = xml.selectSingleNode("form[#name='formname']");
form_submit_url = form_element.GetAttribute("action");
// #3b
element_nodes = form_element.SelectNodes("input,select,textarea", nsmgr)
// #4
foreach (XmlNode input_element in element_nodes) {
if (query_string.length > 0) { query_string.Append("&"); }
// MyFormElementValue() is a function/value you need to provide/define.
query_string.Append(input_element.GetAttribute("name") + "=" + MyFormElementValue(input_element.GetAttribute("name"));
}
// #5
// This is a GET request, you can figure out POST as needed, and deduce the submission type via the <form> element's attribute.
request = (HttpWebRequest)WebRequest.Create(form_submit_url + "?" + query_string.ToString()));
References:
Link
http://www.developerfusion.com/forum/thread/26371/
http://msdn.microsoft.com/en-us/library/system.xml.xmlelement.getattribute.aspx
http://msdn.microsoft.com/en-us/library/system.xml.xmlelement.selectnodes.aspx
If you don't want to go the HttpWebRequest route, I would suggest WatiN. Makes it very easy to automate IE or Firefox and not worry about the internals of the HTTP requests.
Related
I have the below asp.net page which accepts a "url" query string key whose value can be an un-encoded url:
http://localhost:4104/WebSiteForTest/TinyUrl.aspx?url=http://www.google.co.uk/#hl=en&q=life&oq=life&aq=f&aqi=g-s1g9&aql=&gs_sm=3&gs_upl=2803373l2803701l2l2803826l4l4l0l0l0l0l188l453l0.3l3l0&bav=on.2,or.r_gc.r_pw.r_cp.,cf.osb&fp=94681dc4659502d1&biw=1680&bih=883
Now from this page, how would that be possible to read the text after ".aspx?"?
I checked the Request.Url.AbsoluteUri property and it only showed
"http://localhost:4104/WebSiteForTest/TinyUrl.aspx?url=http://www.google.co.uk/"
I also checked with the Request.QueryString with the below code:
private void getQueryString()
{
var sb = new StringBuilder();
var queryStringCount = Request.QueryString.Keys.Count;
for (int keyIndex = 0; keyIndex < queryStringCount; keyIndex++)
{
sb.Append(Request.QueryString.Keys[keyIndex]).Append("=").Append(Request.QueryString[keyIndex]);
if (keyIndex != (queryStringCount - 1))
{
sb.Append("&");
}
}
}
However, the code after "#" doesn't appear in any query string.
how would that be possible to read the text after ".aspx?"?
if you say it's not possible, how Google uses "#" in their url then when you search for something?!
http://www.google.co.uk/#hl=en&site=&q=life&oq=life&aq=f&aqi=g-s1g9&aql=&gs_sm=3&gs_upl=3317l3630l0l3755l4l4l0l0l0l0l125l391l3.1l4l0&bav=on.2,or.r_gc.r_pw.r_cp.,cf.osb&fp=94681dc4659502d1&biw=1680&bih=849
Thanks,
It's not possible to get value after anchor on server side, you can check this with fiddler or something similar, you should deal with this on client. Browser simply strips all after anchor.
Retrieving Anchor Link In URL for ASP.Net
c# get complete URL with "#"
Update:
I don't know how google exactly do this, but if you look with fiddler after initial request there goes another without #, here is a fidller log for request from your question :
so my advice is look with fiddler how google do this, or maybe ask another question
Use Request.QueryString
http://localhost:4104/WebSiteForTest/TinyUrl.aspx?url=http://www.google.co.uk/#hl=en&q=life&oq=life&aq=f&aqi=g-s1g9&aql=&gs_sm=3&gs_upl=2803373l2803701l2l2803826l4l4l0l0l0l0l188l453l0.3l3l0&bav=on.2,or.r_gc.r_pw.r_cp.,cf.osb&fp=94681dc4659502d1&biw=1680&bih=883
<%=Request.QueryString("url")%> will get the ?url parameter
I assume you're using C# to do this. You can easily get the parameters and their values by iterating through the request object. Or in this case, since you know the name of the parameter, simply do this:
String url = Request.QueryString["url"];
More information on iterating through your request parameters can be found here.
The Uri Type works as well.
String yourHttpUri ="....";
Uri yourURI = new Uri(yourHttpUri);
yourURI.query // "?url=http://www.google.co.uk/"
yourURI.fragment // "#hl=en&q=life&oq=life&aq=f&aqi=g-s1g9&aql=&gs_sm=3&gs_upl=2803373l2803701l2l2803826l4l4l0l0l0l0l188l453l0.3l3l0&bav=on.2,or.r_gc.r_pw.r_cp.,cf.osb&fp=94681dc4659502d1&biw=1680&bih=883"
Edit:
Have you tried Request.Url.ToString(); (And create a new Uri from the result)
Similar threads may be:
How do I programmatically send information to a web service in C# with .Net?
Maintain button submit value when using link to submit a form
Looking how to send multiple values with an input type=submit form
How send a form with Javascript when input name is "submit"?
How do you submit a form with javascript with an < input type="button">
...All of the above seem to almost answer the question...but I'm totally mystified. I think something along the lines of "Net.Post(something)" is how it is done...but I am not sure.
I am currently using F#, and I have figured out how to parse links. I also figured out how to catch the various search bars and submission buttons with regexes.
I would like to use my code to search for something on a search engine by specifically:
First, obtaining the HTML
Second, Scraping the HTML for the various buttons, text bars, and links
Third, use some unknown method/device/tool/function to send a search thread to a text box...
Fourth, Simulate an actual mouse click on the submit "button" that appears on the website...
...
Then, upon reciept of the server's response, pull the HTML from the next site.
Here is my link code as it stands:
type Url(x:string)=
member this.Tostring = sprintf "%A" x
member this.Request = System.Net.WebRequest.Create(x)
member this.Response = this.Request.GetResponse()
member this.Stream = this.Response.GetResponseStream()
member this.Reader = new System.IO.StreamReader(this.Stream)
member this.Html = this.Reader.ReadToEnd()
let linkex = "href=\s*\"[^\"h]*(http://[^&\"]*)\""
let getLinks (txt:string) = [for m in Regex.Matches(txt,linkex)
-> m.Groups.Item(1).Value ]
let collectLinks (url:Url) = url.html
|> getLinks
... I know how to grab the search box strings and whatnot...the question is, when I go to, say, google.com and grab the search bar the same way I grab the links, update the value field with my search string, how do I then submit the updated search bar to google's server?
Secondly, how do I do it if I want to update it and then simulate a mouse click?
In other words, I want to interact with websites the way the user interacts with websites.
There are essentially two options - some web pages accept data using HTTP GET (which means that the data is sent as part of the URL) and other use HTTP POST (which means that the data is sent as the body of the request).
If you're using i.e. Google, then you can use HTTP GET and put the query string in the URL. For example, you can just download a web page with the following URL: https://www.google.com/search?q=hello. So, all you need to do is to generate the URL as follows:
let search = sprintf "http://www.google.com/search?q=%s"
If you want to send HTTP POST request from F#, then you need to create a request with body that contains the form values in an encoded form. This can be written as follows:
open System.Text
open System.IO
open System.Net
// URL of a simple page that takes two HTTP POST parameters. See the
// form that submits there: http://www.snee.com/xml/crud/posttest.html
let url = "http://www.snee.com/xml/crud/posttest.cgi"
// Create & configure HTTP web request
let req = HttpWebRequest.Create(url) :?> HttpWebRequest
req.ProtocolVersion <- HttpVersion.Version10
req.Method <- "POST"
// Encode body with POST data as array of bytes
let postBytes = Encoding.ASCII.GetBytes("fname=Tomas&lname=Petricek")
req.ContentType <- "application/x-www-form-urlencoded";
req.ContentLength <- int64 postBytes.Length
// Write data to the request
let reqStream = req.GetRequestStream()
reqStream.Write(postBytes, 0, postBytes.Length);
reqStream.Close()
// Obtain response and download the resulting page
// (The sample contains the first & last name from POST data)
let resp = req.GetResponse()
let stream = resp.GetResponseStream()
let reader = new StreamReader(stream)
let html = reader.ReadToEnd()
Aside, your use of type with member for every step is a bit weird. Members are re-executed each time you access them, so the code you wrote is pretty non-deterministic. You should use let binding instead.
I am reading from the history database, and for every URL read, I am downloading it and storing the data into a string. I want to be able to determine if the link is a download link, i.e. .exe or .zip for e.g. I am assuming I need to read the headers to determine this, but I don't know how to do it with WebClient. Any suggestions?
while (sqlite_datareader.Read())
{
noIndex = false;
string url = (string)sqlite_datareader["url"];
try
{
if (url.Contains("http") && (!url.Contains(".pdf")) && (!url.Contains(".jpg")) && (!url.Contains("https")) && !isInBlackList(url))
{
WebClient client = new WebClient();
client.Headers.Add("user-agent", "Only a test!");
String htmlCode = client.DownloadString(url);
}
}
}
Instead of loading the complete content behind the link, I would issue a HEAD request.
The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response. The metainformation contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request. This method can be used for obtaining metainformation about the entity implied by the request without transferring the entity-body itself. This method is often used for testing hypertext links for validity, accessibility, and recent modification.
Quote of http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
See these questions for C# examples
How to check if a file exists on a server using c# and the WebClient class
How to check if System.Net.WebClient.DownloadData is downloading a binary file?
You're on the right track; you'll need to examine the ResponseHeaders after a successful request:
var someType = "application/zip";
if (client.ResponseHeaders["Content-Type"].Contains(someType)) {
// this was a "download link"
}
The tricky part will be in determining what constitutes a download link since there are so many content types possible. For example, how would you decide whether XML data is a download link or not?
Try to check WebClient's ResponseHeaders collections to validate response file type.
In case, anyone has the same problem, I have used an attribute in the history places.sqlite database which came in very handy!
Places.sqlite contains a table called moz_historyvisits which contains a column visit_type. According to [1], a visit_type of 7 is a download link. Therefore, reading this value will determine if it is a download link without reading the response header or even sending out a head method.
[1] http://www.firefoxforensics.com/research/moz_historyvisits.shtml
I need to redirect a user to http://www.someurl.com?id=2 using a POST method.
Is it possible? If yes, then how?
Right now I have following and it forwards the POST data properly, but it removes the ?id=2:
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://www.someurl.com?id=2");
request.Method = WebRequestMethods.Http.Post;
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = postData.Length;
using (StreamWriter writer = new StreamWriter(request.GetRequestStream()))
{
writer.Write(postData);
}
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
using (StreamReader reader = new StreamReader(response.GetResponseStream()))
{
Response.Write(reader.ReadToEnd());
}
The reason I need both query string data -> ?id=2 and POST data is because I pass the query string to page in which javascript is going to be handling the query string data and .NET is going to work with data sent via POST method. The POST data that I am passing can be longer than maximum amount of characters that GET method allows, therefore I can't use only GET method... so, what are your suggestions?
More information:
I am writing a routing page, which adds some custom information to query string and then routes all of the data, old and new further to some URL that was provided. This page should be able to redirect to our server as well as to someone's server and it doesn't need to know where it came from or where it goes, it just simply needs to keep the same POST, GET and HEADER information as well as additional information received at this step.
No. There is no fathomable reason to mix POST and GET.
If you need to make parameters passed in with the request available to Javascript, simply POST them to the server, and have the server spit out the relevant information in a hidden field...
<input type="hidden" value="id=2,foo=bar" disabled="disabled" />
Simple as that.
Note: Disable the hidden field to exclude it from the subseqient POST, if there is one ;)
The problem i think that the problem could be that postData does not contain the id parameter as it is supplied through querystring.
Posted data is in the body of the request and querystring data is in the url.
You probably need to fetch the id from request.querystring to you postdata variable.
Given the extra information to your question, that you require to submit to an external source, what I believe you must do is process all of the data and return a form with hidden fields. Add some javascript to submit that form to the external URL immediately upon load. Note that you won't get file uploads this way, but you can appropriately handle POST and GET data.
As far as I know, it isn't possible to redirect with POST. Couldn't you simply pretend (internally handle as if) that the request was made to the page you want to redirect the user to?
The closest answer I've found to this problem is here, but it's not transparent to the user, therefore it's not good enough for me --> Response.Redirect with POST instead of Get?
If anybody has any other suggestions, please respond!
Try sending all of the data including the id in POST. Then when you are processing the data in C#, you can read the id variable in and write it back out to your webpage within a tag:
<script type="text/javascript">
id = <%=request_id%>
</script>
Then just make sure your javascript starts running after fully loaded with an onload() call and you're good to go.
What you are actually trying to do is redirect your POST data. May I ask why? I can't see any reason why you would want to do this if in fact both pages are on your servers.
What you should be doing is processing all of your POST data in script #1, and then redirecting to something like script2.aspx?id=234 where the ID 234 is in reference to the data in your database. You can then recall it later on script2 and dump all the data into Javascript variables for your client-side stuff to use.
Either way, something about this process sounds fishy to me. Mixing up your data processing client-side and server side is like mixing vodka and milk. It rarely works well. (But white russians sure are tasty!)
Actually I was able to achieve the desired result by mixing Javascript and Codebehind code. So, what I've done is in server side code I've built an entire web page like following:
var strForm = new StringBuilder();
strForm.Append("<form id=\"" + formId + "\" name=\"" + formId + "\" action=\"" + url + queryString +"\" method=\"POST\">");
foreach (string key in data)
{
strForm.Append("<input type=\"hidden\" name=\"" + key + "\" value=\"" + data[key].Replace("\"", """) + "\">");
}
strForm.Append("</form>");
And in addition to this form built on server side, I add a javascript code that submits the form I've just built.
var strScript = new StringBuilder();
strScript.Append("<script language='javascript'>");
strScript.Append("var v" + formId + " = document." + formId + ";");
strScript.Append("v" + formId + ".submit();");
strScript.Append("</script>");
strForm.Append("</form>");
So, what this code does, is as you see, the form action is an URL with query string parameters attached to it... but since the form method is POST, we submit the values we added as a hidden fields as POST parameters... So we end up submitting both POST and GET parameters.
Hope this solution will help somebody =)
I have a project at work the requires me to be able to enter information into a web page, read the next page I get redirected to and then take further action. A simplified real-world example would be something like going to google.com, entering "Coding tricks" as search criteria, and reading the resulting page.
Small coding examples like the ones linked to at http://www.csharp-station.com/HowTo/HttpWebFetch.aspx tell how to read a web page, but not how to interact with it by submitting information into a form and continuing on to the next page.
For the record, I'm not building a malicious and/or spam related product.
So how do I go read web pages that require a few steps of normal browsing to reach first?
You can programmatically create an Http request and retrieve the response:
string uri = "http://www.google.com/search";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
// encode the data to POST:
string postData = "q=searchterm&hl=en";
byte[] encodedData = new ASCIIEncoding().GetBytes(postData);
request.ContentLength = encodedData.Length;
Stream requestStream = request.GetRequestStream();
requestStream.Write(encodedData, 0, encodedData.Length);
// send the request and get the response
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
// Do something with the response stream. As an example, we'll
// stream the response to the console via a 256 character buffer
using (StreamReader reader = new StreamReader(response.GetResponseStream()))
{
Char[] buffer = new Char[256];
int count = reader.Read(buffer, 0, 256);
while (count > 0)
{
Console.WriteLine(new String(buffer, 0, count));
count = reader.Read(buffer, 0, 256);
}
} // reader is disposed here
} // response is disposed here
Of course, this code will return an error since Google uses GET, not POST, for search queries.
This method will work if you are dealing with specific web pages, as the URLs and POST data are all basically hard-coded. If you needed something that was a little more dynamic, you'd have to:
Capture the page
Strip out the form
Create a POST string based on the form fields
FWIW, I think something like Perl or Python might be better suited to that sort of task.
edit: x-www-form-urlencoded
You might try Selenium. Record the actions in Firefox using Selenium IDE, save the script in C# format, then play them back using the Selenium RC C# wrapper. As others have mentioned you could also use System.Net.HttpWebRequest or System.Net.WebClient. If this is a desktop application see also System.Windows.Forms.WebBrowser.
Addendum: Similar to Selenium IDE and Selenium RC, which are Java-based, WatiN Test Recorder and WatiN are .NET-based.
What you need to do is keep retrieving and analyzing the html source for each page in the chain. For each page, you need to figure out what the form submission will look like and send a request that will match that to get the next page in the chain.
What I do is build a custom class the wraps System.Net.HttpWebRequest/HttpWebResponse, so retrieving pages is as simple as using System.Net.WebClient. However, my custom class also keeps the same cookie container across requests and makes it a little easier to send post data, customize the user agent, etc.
Depending on how the website works you can either manipulate the url to perform what you want. e.g to search for the word "beatles" you could just open a request to google.com?q=beetles and then just read the results.
Alternatively if the website does not use querystring values (url) to process page actions then you will need to work on a webrequest which posts the required values to the website instead. Search in Google for working with WebRequest and webresponse.