programmatically clicking links in c# - c#

I'am trying to make a script to programmatically.
I managed to get the hole html page to a string, Now I want to somehow click the elements that I have there. I'm kind of lost so any info could help.
I'v tried to get the document as a HtmlDocument but for some reason when I use the getElementById method it doesnt find the element.
Please, Any info would help.
Thanks.
Currently this is the code i'v got,
It brings me up to the point where i have string that is it's value is the html document, now I need to some how extract the relavent tag and click it programmatically.
Thanks for your inputs,
Still waiting for one that can help me.
string email = "someemail*";`enter code here`
string pw = "somepass";
string PostData = String.Format("email={0}&pass={1}", email, pw);
CookieContainer cookieContainer = new CookieContainer();
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create("http://www.facebook.com/*******");
req.CookieContainer = cookieContainer;
req.Method = "POST";
req.ContentLength = PostData.Length;
req.ContentType = "application/x-www-form-urlencoded";
req.AllowAutoRedirect = true;
req.UserAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2";
ASCIIEncoding encoding = new ASCIIEncoding();
byte[] loginDataBytes = encoding.GetBytes(PostData);
req.ContentLength = loginDataBytes.Length;
Stream stream = req.GetRequestStream();
stream.Write(loginDataBytes, 0, loginDataBytes.Length);
HttpWebResponse webResp = (HttpWebResponse)req.GetResponse();
Stream datastream = webResp.GetResponseStream();
StreamReader reader = new StreamReader(datastream);
string sLine = "";
string json = "";
while (sLine != null)
{
sLine = reader.ReadLine();
json += sLine;
}
json.ToString();

perhaps you might want to look at WaitN, it allows you to do all this really really easily

A "click on a link" is the same thing as sending a HTTP request. If you can parse the URI from the document you have, you can create the HTTP request separately and send that.

Clicking on a link is done by issuing a HTTP-Get for the href of the link.
If there is JavaScript interactivity, then you need to take a webbrowser element, and inject a javascript, that on document.ready executes document.getElementById("whatever").click()
See
How do I programmatically click a link with javascript?
You can use HTML agility pack to parse a HTML document and extract the HREF argument.

Related

Invalid Uri : The uri scheme is not valid

I am trying to login to a website via WebRequest. I get an exception at this point :
WebRequest req = WebRequest.Create(formUrl.Trim());
string url,string username,string password come from a text box. This is the full code:
public void LoginToUrl(string url,string username, string password )
{
formUrl = url;
formParams = string.Format("username={0}&password={1}", username,password);
WebRequest req = WebRequest.Create(formUrl.Trim());//
req.ContentType = "application/x-www-form-urlencoded";
req.Method = "POST";
bytes = Encoding.ASCII.GetBytes(formParams);
req.ContentLength = bytes.Length;
using (Stream os = req.GetRequestStream())
{
os.Write(bytes, 0, bytes.Length);
}
WebResponse resp = req.GetResponse();
cookieHeader = resp.Headers["Set-cookie"];
}
This is the POST Data:
Host=internetlogin1.cu.edu.ng
User-Agent=Mozilla/5.0 (Windows NT 10.0; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0
Accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language=en-US,en;q=0.5
Accept-Encoding=gzip, deflate
Refer this link
Connection=keep-alive
Content-Type=application/x-www-form-urlencoded
Content-Length=49
POSTDATA=dst=&popup=true&username=13ck015373&password=F3NB
You should pass a valid URL to create a WebRequest.
The error says that URL (that comes from textbox) dose not contains scheme ('http://' or 'https://') or it is invalid.
Enter this URL in text-box (don't forget http or https):
http://internetlogin1.cu.edu.ng or https://internetlogin1.cu.edu.ng
If there are the parameters of url-string then you need to add them through '?' and '&' chars
I had a schema uri invalid issue as well. Had to do the following for it to work. Not sure why.. but fyi
Works:
Uri serverUri = new Uri("http://url.com/sub/somethingService");
var webRequest = (HttpWebRequest)WebRequest.Create(serverUri);
What did not work (had this same error):
var webRequest = (HttpWebRequest)WebRequest.Create("http://url.com/sub/somethingService");
please add 'http://' or 'https://' to your url or check that not have ("")
for example "http://YourUrl:YourPort" is not correct and should replace to http://YourUrl:YourPort . you enter your url into notepad to you find problem

Post FORM by C# and wait javascript has been executed

I am submitting a aspx with my C# code behind which works good, however I get the html direct back as I do not display this in a browser.
This is my C# code:
string getUrl = "https://www.facebook.com/login.php?login_attempt=1";
string email = "email#email.com";
string pw = "pwd";
string postData = String.Format("email={0}&pass={1}", email, pw);
HttpWebRequest getRequest = (HttpWebRequest)WebRequest.Create(getUrl);
getRequest.CookieContainer = new CookieContainer();
getRequest.CookieContainer.Add(cookies); //recover cookies First request
getRequest.Method = WebRequestMethods.Http.Post;
getRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2";
getRequest.AllowWriteStreamBuffering = true;
getRequest.ProtocolVersion = HttpVersion.Version11;
getRequest.AllowAutoRedirect = true;
getRequest.ContentType = "application/x-www-form-urlencoded";
byte[] byteArray = Encoding.ASCII.GetBytes(postData);
getRequest.ContentLength = byteArray.Length;
Stream newStream = getRequest.GetRequestStream(); //open connection
newStream.Write(byteArray, 0, byteArray.Length); // Send the data.
newStream.Close();
HttpWebResponse getResponse = (HttpWebResponse)getRequest.GetResponse();
string sourceCode = "";
using (StreamReader sr = new StreamReader(getResponse.GetResponseStream()))
{
sourceCode = sr.ReadToEnd();
}
this delivers me the response I need, however the page should load javascript first and than delivered back as html, unfortunatly it does not do this.
I have been looking to:
Open this page on POST in a popup browser (or at least post this and
wait till javascript is loaded complete)
Get the loaded page back
after javascript is fully loaded instead of the current
Of course I prefer this to be before:
HttpWebResponse getResponse = (HttpWebResponse)getRequest.GetResponse();
I have tried to:
- use WebBrowser wb = new WebBrowser();, however this gives a single thread error and seems not to be possible to Post the page to the url.
- while (wb.ReadyState != WebBrowserReadyState.Complete), this can't be used as I do not load a actually page but get only the response
Anybody has a good and smart idea to load the page in a browser with the POST, wait till javascript has been executed and me load this html into my C# code?

HTTP Post with JSON

I can't seem to get the hang of my HTTP POST methods. I have just learned how to do GET methods to retrieve webpages but now i'm trying to fill in information on the webpage and can't seem to get it working. The source code that comes back is always an invalid page (full of broken images/not the right information)
public static void jsonPOST(string url)
{
url = "http://treasurer.maricopa.gov/Parcel/TaxReceipt.aspx/GetTaxReceipt";
var httpWebRequest = (HttpWebRequest)WebRequest.Create(new Uri(url));
httpWebRequest.ContentType = "application/json; charset=utf-8";
httpWebRequest.Accept = "application/json, text/javascript, */*; q=0.01";
httpWebRequest.Headers.Add("Accept-Encoding: gzip, deflate");
httpWebRequest.CookieContainer = cookieJar;
httpWebRequest.Method = "POST";
httpWebRequest.Headers.Add("Accept-Language: en-US,en;q=0.5");
httpWebRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW65; Trident/7.0; MAM5; rv:11.0) like Gecko";
httpWebRequest.Referer = "http://treasurer.maricopa.gov/Parcel/TaxReceipt.aspx";
string postData = "{\"startDate\":\"1/1/2013\",\"parcelNumber\":\"17609419\"}";
byte[] bytes = System.Text.Encoding.ASCII.GetBytes(postData);
httpWebRequest.ContentLength = bytes.Length;
System.IO.Stream os = httpWebRequest.GetRequestStream();
os.Write(bytes, 0, bytes.Length); //Push it out there
os.Close();
System.Net.WebResponse resp = httpWebRequest.GetResponse();
if (resp == null)
{
Console.WriteLine("null");
}
System.IO.StreamReader sr = new System.IO.StreamReader(resp.GetResponseStream());
string source = sr.ReadToEnd().Trim();
}
EDIT: I updated the code to reflect my new problem. The problem i have now is that the source code is not what is coming back to me. I am getting just the raw JSON information in the source. Which i can use to deserialize the information i need to obtain, but i'm curious why the actual source code isn't coming back to me
The source code that comes back is always an invalid page (full of broken images/not the right information)
It sounds like you just get the Source code without thinking of relative paths. As long as there are relative paths on the site it will not show correctly at your copy. You have to replace all the relative paths before it is useful.
http://webdesign.about.com/od/beginningtutorials/a/aa040502a.htm
Remember crossdomain ajax can be a problem in that situation.

Scraping a website to get the element name and id through C# web browser

I am trying to scrape a website to get the Textarea information.
I'm using:
HtmlDocument doc = this.webBrowser1.Document;
When I look at the view source it shows <textarea name="message" class="profile">
But when I try to access this textarea with:
HtmlDocument doc = this.webBrowser1.Document;
doc.GetElementsByTagName("textarea")
.GetElementsByName("message")[0]
.SetAttribute("value", "Hello");
It shows the error:
Value of '0' is not valid for 'index'. 'index' should be between 0 and -1.
Parameter name: index
Any Help?
For your current need you can simply use this:
doc.GetElementsByTagName("textarea")[0].InnerText = "Hello";
For complex things you can use HtmlDocument class with MSHTML class.
I can entrust HtmlAgilityPack to you!
I'd like to think that you try to access a website that uses cookies to determine if a user is logged in (or not). If not, it will force you to register/log-in else you aren't allowed to see anything. Am I right?
Your browser stores that cookies, your C# does not! (broadly speaking)
You need to create a cookie container to solve that problem.
Your C#-App may log-in, request a cookie/session, may grab the Cookies from the responseheader and then you should be able to scrape the profiles or whatever you want.
Get the Post Data, which is send to server. You can use tools/addons like Fiddler, Tamper, ect..
E.g. PostdataString: user_name=TESTUSER&password=TESTPASSWORD&language=en&action%3Asubmit=Submit
Here is a snippet you can use.
//Create the PostData
string strPostData = "user_name=" + txtUser.Text + "&password=" + txtPass.Text + "&language=en&action%3Asubmit=Submit";
CookieContainer tempCookies = new CookieContainer();
ASCIIEncoding encoding = new ASCIIEncoding();
byte[] data = encoding.GetBytes(strPostData);
//Create the Cookie
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://www.website.com/login.php");
request.Method = "POST";
request.KeepAlive = true;
request.AllowAutoRedirect = false;
request.Accept = "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
request.ContentType = "application/x-www-form-urlencoded";
request.Referer = "http://www.website.com/login.php";
request.UserAgent = "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20100101 Firefox/14.0.1";
request.ContentLength = data.Length;
Stream requestStream = request.GetRequestStream();
requestStream.Write(data, 0, data.Length);
HttpWebResponse response;
response = (HttpWebResponse)request.GetResponse();
string sRequestHeaderBuffer = Convert.ToString(response.Headers);
requestStream.Close();
//Stream(-output) of the new website
StreamReader postReqReader = new StreamReader(response.GetResponseStream());
//RichTextBox to see the new source.
richTextBox1.Text = postReqReader.ReadToEnd();
You will need to adjust the Cookie-parameters in between and add your current sessionid aswell to the code. This depends on the requested website you visit.
E.g.:
request.Headers.Add("Cookie", "language=en_US.UTF-8; StationID=" + sStationID + "; SessionID=" + sSessionID);

Screen scrape an ASP.NET Page not working

I am trying to bring back the calendar events on the page at the following site: http://www.wphospital.org/News-Events/Calendar-of-Events.aspx
Notice that this site has a link called "Month" - I need to be able POST data requesting calendar events for a particular month. I cannot get this to work. Here is the code:
private static void GetData(ref string buf)
{
try
{
//First, request the search form to get the viewstate value
HttpWebRequest webRequest = default(HttpWebRequest);
webRequest = (HttpWebRequest)System.Net.WebRequest.Create("http://www.wphospital.org/News-Events/Calendar-of-Events.aspx");
StreamReader responseReader = new StreamReader(webRequest.GetResponse().GetResponseStream());
string responseData = responseReader.ReadToEnd();
responseReader.Close();
//Extract the viewstate value and build out POST data
string viewState = ExtractViewState(responseData);
string eventValidation = ExtractEventValidation(responseData);
string postData = null;
postData = String.Format("ctl00$manScript={0}&__EVENTTARGET=&__EVENTARGUMENT&__LASTFOCUS=&__VIEWSTATE={1}&lng={2}&__EVENTVALIDATION={3}&ctl00$searchbox1$txtWord={4}&textfield2={5}&ctl00$plcMain$lstbxCategory={6}&ctl00$plcMain$lstbxSubCategory={7}", "ctl00$plcMain$updMonthNav|ctl00$plcMain$btnNextMonth", viewState, "en-US", eventValidation, "Search", "your search here", 0, 0);
var encoding = new ASCIIEncoding();
byte[] data = encoding.GetBytes(postData);
//Now post to the search form
webRequest = (HttpWebRequest)System.Net.WebRequest.Create("http://www.wphospital.org/News-Events/Calendar-of-Events.aspx");
webRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)";
webRequest.Method = "POST";
webRequest.ContentType = "application/x-www-form-urlencoded";
webRequest.ContentLength = data.Length;
var newStream = webRequest.GetRequestStream();
newStream.Write(data, 0, data.Length);
newStream.Close();
responseReader = new StreamReader(webRequest.GetResponse().GetResponseStream());
//And read the response
responseData = responseReader.ReadToEnd();
responseReader.Close();
buf = responseData;
}
catch (WebException ex)
{
if (ex.Status == WebExceptionStatus.ProtocolError)
{
Console.Write("The server returned protocol error ");
// Get HttpWebResponse so that you can check the HTTP status code.
HttpWebResponse httpResponse = (HttpWebResponse)ex.Response;
int sc = (int)httpResponse.StatusCode;
string strsc = httpResponse.StatusCode.ToString();
}
}
}
private static string ExtractViewState(string s)
{
string viewStateNameDelimiter = "__VIEWSTATE";
string valueDelimiter = "value=\"";
int viewStateNamePosition = s.IndexOf(viewStateNameDelimiter);
int viewStateValuePosition = s.IndexOf(valueDelimiter, viewStateNamePosition);
int viewStateStartPosition = viewStateValuePosition + valueDelimiter.Length;
int viewStateEndPosition = s.IndexOf("\"", viewStateStartPosition);
return HttpUtility.UrlEncodeUnicode(s.Substring(viewStateStartPosition, viewStateEndPosition - viewStateStartPosition));
}
Can anyone point me in the right direction?
This may or may not solve your problem because I don't know exactly what the problem is when you say it's not working. But as "Al W" noted - the response from an async postback is not going to look like a straight HTML stream. So if your problem is parsing it afterwards then this might help.
I had the "opportunity" to discover this recently because I needed to rewrite that output. I'm working on a C# jQuery port and found that I was breaking WebForms pages when I tried to re-render the output stream during an async postback. I went through the client script that parses the response and figured out the format of the response.
Each panel that is updated will return a block of data that is formatted like:
"Length|Type|ID|Content"
There could be any number of these strung together. Type is "updatePanel" for UpdatePanels. ID is the UniqueID of the control, and Content is the actual HTML data. Length is equal to the number of bytes in Content, and you need to use that to parse each block, because the separator character may appear inside Content itself. So if you decided you wanted to rewrite this data before sending it back to an ASP.NET page (like I did) you need to update Length to reflect the final length of your content.
The code I used to parse and rewrite it is in Server/CsQueryHttpContext.
For POST operations, you want it to be UTF-8 encoded, so just re-do the one line
//var encoding = new ASCIIEncoding();
//byte[] data = encoding.GetBytes(postData);
//do this instead.....
byte[] data = Encoding.UTF8.GetBytes(postData);
and see if this helps you out
Below is the network trace I get in chrome when clicking the monthly button. Notice the __EVENTTARGET:ctl00$plcMain$monthBtn asp.net has a javascript framework in there that is calling a javascript:postback() method when that link is clicked, which sets the event target. That's basically how ASP.NET webforms knows which event to fire on a postback. One tricky thing here is that the web page is using an update panel so you might not get a true html response. If you can get your request to look something like this, then you should get a successful response. Hope this helps.
Request URL:http://www.wphospital.org/News-Events/Calendar-of-Events.aspx
Request Method:POST
Status Code:200 OK
Request Headers
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-US,en;q=0.8
Cache-Control:no-cache
Content-Length:9718
Content-Type:application/x-www-form-urlencoded
Cookie:CMSPreferredCulture=en-US; ASP.NET_SessionId=h2nval45vq0q5yb0cp233huc; __utma=101137351.234148951.1312486481.1312486481.1312486481.1; __utmb=101137351.1.10.1312486481; __utmc=101137351; __utmz=101137351.1312486481.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __unam=ef169fe-131964a5f2a-24ec879b-1
Host:www.wphospital.org
Origin:http://www.wphospital.org
Proxy-Connection:keep-alive
Referer:http://www.wphospital.org/News-Events/Calendar-of-Events.aspx
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.124 Safari/534.30
X-MicrosoftAjax:Delta=true
Form Dataview URL encoded
ctl00$manScript:ctl00$plcMain$updTab|ctl00$plcMain$monthBtn
__EVENTTARGET:ctl00$plcMain$monthBtn
__EVENTARGUMENT:
__LASTFOCUS:
__VIEWSTATE:<removed for brevity>
lng:en-US
__EVENTVALIDATION:/wEWEgLbj/nSDgKt983zDgKWlOLbAQKr3LqFAwKL3uqpBwK9kfRnArDHltMCAuTk0eAHAsfniK0DAteIosMPAsiIosMPAsmIosMPAsuIosMPAoD0ookDApCbiOcPAo biOcPAombiOcPAoubiOcPyfqRx8FdqYzlnnkXcJEJZzzopJY=
ctl00$searchbox1$txtWord:Search
textfield2:Enter your search here
ctl00$plcMain$lstbxCategory:0
ctl00$plcMain$lstbxSubCategory:0
ctl00$plcMain$hdnEventCount:2

Categories