Console app to login to ASP.NET website - c#

First, please excuse my naivety with this subject. I'm a retired programmer that started before DOS was around. I'm not an expert on ASP.NET. Part of what I need to know is what I need to know. (If yo follow me...)
So I want to log into a web site and scrape some content. After looking at the HTML source with notepad and fiddler2, it's clear to me that the site is implemented with ASP.NET technologies.
I started by doing a lot of google'ing and reading everything I could find about writing screen scrapers in c#. After some investigation and many attempts, I think I've come to the conclusion that it isn't easy.
The crux of the problem (as I see it now) is that ASP provides lots of ways for a programmer to maintain state. Cookies, viewstate, session vars, page vars, get and post params, etc. Plus the programmer can divide the work up between server and client scripting. A rich web client such as IE or Safari or Chrome or Firefox knows how to handle whatever the programmer writes (and the ASP framework implements under the covers).
WebClient isn't a rich web client. It doesn't even know how to implement cookies.
So I'm at an impasse. One way to go is to try to reverse engineer all the features of the rich client that the ASP application is expecting and write a WebClient on steroids class that mimics a rich client well enough to got logged in.
Or I could try embedding IE (or some other rich client) into my app and hope the exposed interface is rich enough that I can programmatically fill a username and password field and POST the form back. (And access the response stream so I can parse the HTML to scrape out the data I'm after...)
Or I could look for some 3rd party control that would be a lot richer that WebClient.
Can anyone shed some keen insight into where I should focus my attention?
This is as much a learning experience as a project. That said, I really want to automate login and information retrieval from the target site.

Here an example function I use to log in website and get my cookie
string loginSite(string url, string username, string password)
{
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url);
string cookie = "";
//this values will change depending on the website
string values = "vb_login_username=" + username + "&vb_login_password=" + password
+ "&securitytoken=guest&"
+ "cookieuser=checked&"
+ "do=login";
req.Method = "POST";
req.ContentType = "application/x-www-form-urlencoded";
req.ContentLength = values.Length;
CookieContainer a = new CookieContainer();
req.CookieContainer = a;
System.Net.ServicePointManager.Expect100Continue = false; // prevents 417 error
using (StreamWriter writer = new StreamWriter(req.GetRequestStream(), System.Text.Encoding.ASCII)) { writer.Write(values); }
HttpWebResponse c = (HttpWebResponse)req.GetResponse();
Stream ResponseStream = c.GetResponseStream();
StreamReader LeerResult = new StreamReader(ResponseStream);
string Source = LeerResult.ReadToEnd();
foreach (Cookie cook in c.Cookies) { cookie = cookie + cook.ToString() + ";"; }
return cookie;
}
And here a call example:
string Cookie = loginSite("http://theurl.comlogin.php?s=c29cea718f052eae2c6ed105df2b7172&do=login", "user", "passwd");
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create("http://www.theurl.com");
//once you got the cookie you add it to the header.
req.Headers.Add("cookie", Cookie);
HttpWebResponse response = (HttpWebResponse)req.GetResponse();
using (Stream respStream = response.GetResponseStream())
{
using (StreamReader sr = new StreamReader(respStream))
{
string s = sr.ReadToEnd();
HtmlReturn = s;
// System.Diagnostics.Debugger.Break();
}
}
With Firefox you could use the extension HTTP-Headers to know what parameters are being set by post and you modify the variable values:
string values = "vb_login_username=" + username + "&vb_login_password=" + password
+ "&securitytoken=guest&"
+ "cookieuser=checked&"
+ "do=login";
To match with parameters on the destination website.
If you decide to Live-HTTP-HEaders for firefox, when you log into the website you will get
the post information from headers, something like that:
GET / HTTP/1.1 Host: www.microsoft.com User-Agent: Mozilla/5.0
(Windows NT 6.1; rv:15.0) Gecko/20100101 Firefox/15.0.1 Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8
Accept-Language: es-es,es;q=0.8,en-us;q=0.5,en;q=0.3 Accept-Encoding:
gzip, deflate Connection: keep-alive Cookie:
WT_FPC=id=82.144.112.152-154450144.30258861:lv=1351580394112:ss=1351575867559;
WT_NVR_RU=0=msdn:1=:2=; omniID=0d2276c2_bbdd_4386_a11d_f8da1dbc5489;
MUID=349E06C547426937362B02CC434269B9;
MC1=GUID=47b2ed8aeea0de4797d3a40cf549dcbb&HASH=8aed&LV=201210&V=4&LU=1351608258765;
A=I&I=AxUFAAAAAAALBwAAukh4HjpMmS4eKtKpWV0ljg!!&V=4; msdn=L=en-US

I suspect you may be able to build a Chrome extension that could do this for you.
By the way, you're not a "security expert" are you?

Why don't you use IE , Automating IE in Windows Forms is very simple ,plus you can easily handle proxy also .

Related

Using C# HttpClient to login on a website and scrape information from another page

I am trying to use C# and Chrome Web Inspector to login on http://www.morningstar.com and retrieve some information on the page http://financials.morningstar.com/income-statement/is.html?t=BTDPF&region=usa&culture=en-US.
I do not quite understand what is the mental process one must use to interpret the information from Web Inspector to simulate a login and simulate keeping the session and navigating to the next page to collect information.
Can someone explain or point me to a resource ?
For now, I have only some code to get the content of the home page and the login page:
public class Morningstar
{
public async static void Ru4n()
{
var url = "http://www.morningstar.com/";
var httpClient = new HttpClient();
httpClient.DefaultRequestHeaders.TryAddWithoutValidation("Accept", "text/html,application/xhtml+xml,application/xml");
httpClient.DefaultRequestHeaders.TryAddWithoutValidation("Accept-Encoding", "gzip, deflate");
httpClient.DefaultRequestHeaders.TryAddWithoutValidation("User-Agent", "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0");
httpClient.DefaultRequestHeaders.TryAddWithoutValidation("Accept-Charset", "ISO-8859-1");
var response = await httpClient.GetAsync(new Uri(url));
response.EnsureSuccessStatusCode();
using (var responseStream = await response.Content.ReadAsStreamAsync())
using (var decompressedStream = new GZipStream(responseStream, CompressionMode.Decompress))
using (var streamReader = new StreamReader(decompressedStream))
{
//Console.WriteLine(streamReader.ReadToEnd());
}
var loginURL = "https://members.morningstar.com/memberservice/login.aspx";
response = await httpClient.GetAsync(new Uri(loginURL));
response.EnsureSuccessStatusCode();
using (var responseStream = await response.Content.ReadAsStreamAsync())
using (var streamReader = new StreamReader(responseStream))
{
Console.WriteLine(streamReader.ReadToEnd());
}
}
EDIT: In the end, on the advice of Muhammed, I used the following piece of code:
ScrapingBrowser browser = new ScrapingBrowser();
//set UseDefaultCookiesParser as false if a website returns invalid cookies format
//browser.UseDefaultCookiesParser = false;
WebPage homePage = browser.NavigateToPage(new Uri("https://members.morningstar.com/memberservice/login.aspx"));
PageWebForm form = homePage.FindFormById("memberLoginForm");
form["email_textbox"] = "example#example.com";
form["pwd_textbox"] = "password";
form["go_button.x"] = "57";
form["go_button.y"] = "22";
form.Method = HttpVerb.Post;
WebPage resultsPage = form.Submit();
You should simulate the login process of the web site. The easiest way to do this is inspecting the website via some debugger (for example Fiddler).
Here is the login request of the web site:
POST https://members.morningstar.com/memberservice/login.aspx?CustId=&CType=&CName=&RememberMe=true&CookieTime= HTTP/1.1
Accept: text/html, application/xhtml+xml, */*
Referer: https://members.morningstar.com/memberservice/login.aspx
** omitted **
Cookie: cookies=true; TestCookieExist=Exist; fp=001140581745182496; __utma=172984700.91600904.1405817457.1405817457.1405817457.1; __utmb=172984700.8.10.1405817457; __utmz=172984700.1405817457.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmc=172984700; ASP.NET_SessionId=b5bpepm3pftgoz55to3ql4me
email_textbox=test#email.com&pwd_textbox=password&remember=on&email_textbox2=&go_button.x=36&go_button.y=16&__LASTFOCUS=&__EVENTTARGET=&__EVENTARGUMENT=&__VIEWSTATE=omitted&__EVENTVALIDATION=omited
When you inspect this, you'll see some cookies and form fields like "__VIEWSTATE". You'll need the actual values of this field to log in. You can use following steps:
Make a request and scrape fields like "__LASTFOCUS", "__EVENTTARGET", "__EVENTARGUMENT", "__VIEWSTATE", "__EVENTVALIDATION", and cookies.
Create a new POST request to the same page, use CookieContainer from the previous one; build a post string using scraped fields, username and password. Post it with MIME type application/x-www-form-urlencoded.
If successful use the cookies for further requests to stay logged in.
Note: You can use htmlagilitypack, or scrapysharp to scrape html. ScrapySharp provide easy to use tools for form posting forms and browsing websites.
the mental is process is simulate a person login in the website, some logins are made using AJAX or traditional POST request, so, the first thing you need to do, is made that request like browser does, in the server response, you will get cookies, headers and other information, you need to use that info to build a new request, this are the scrappy request.
Steps are:
1) Build a request, like browser does, to authenticate yourself to the app.
2) Inspect response, and saves headers, cookies or other useful info to persisting your session with the server.
3) Make another request to server, using the info you gathered from second step.
4) Inspect response, and use data analysis algorithm or something else to extract the data.
Tips:
You are not using here javascript engine, some websites use javascript to show graphs, or execute some interation in the DOM document. In that cases, maybe you will need to use WebKit lib wrapper.

Login to website using C#

Before everyone gets upset that this has been answered. I have scoured the web looking for how to do this and have tried a number of methods. Login to website, via C# and How to programmatically log in to a website to screenscape? Both of these were helpful but I cannot figure out why I cannot get past the login page. Here is my code:
string url = "https://www.advocare.com/login.aspx";
string url2 = "https://url.after.login";
HttpWebRequest wReq = WebRequest.Create(url) as HttpWebRequest;
wReq.KeepAlive = true;
wReq.Method = "POST";
wReq.AllowAutoRedirect = false;
wReq.ContentType = "application/x-www-form-urlencoded";
string postData = "ctl00$cphContent$txtUserName=Username&ctl00$cphContent$txtPassword=Password";
byte[] dataBytes = UTF8Encoding.UTF8.GetBytes(postData);
wReq.ContentLength = dataBytes.Length;
using (Stream postStream = wReq.GetRequestStream())
{
postStream.Write(dataBytes, 0, dataBytes.Length);
}
HttpWebResponse wResp = wReq.GetResponse() as HttpWebResponse;
string pageSource;
wReq = WebRequest.Create(url2) as HttpWebRequest;
wReq.CookieContainer = new CookieContainer();
wReq.CookieContainer.Add(wResp.Cookies);
HttpWebResponse wResp2 = wReq.GetResponse() as HttpWebResponse;
using (StreamReader sr = new StreamReader(wResp2.GetResponseStream()))
{
pageSource = sr.ReadToEnd();
}
Everytime I look at pageSource it is the HTML for the login.aspx page. I must be missing something here. Maybe it's not taking the cookie, I don't know. One question I have aside from, why doesn't this work, is in the string postData = "". Are those suppose to be the name or id portion of the html tag? Any help on this is greatly appreciated as I am stumped and will have to find a different way. I would like to continue with the WebRequest and WebResponse instead of using WebBrowser. If I can't, oh well. Thanks again for any help!
What are you trying to do besides login? If its like QAing a site programically, i would suggest using selenium andcreate a c# app based off of that. If u want i can post a link to a base project for a selenium based project.
Don't necessarily view the page source, but look at the actual HTTP POST. Install a HTTP proxy such as Fiddler and then re-visit the page you are trying to emulate. Complete the HTTP POST request, and check out the results produced in the proxy. From there you'll be able to see the actual parameters, cookies, headers, etc. that are being passed and you can then attempt to replicate this in your code. It's often easy to miss something when simply viewing the HTML source but monitoring the network traffic is pretty straight forward.

Secure HttpWebRequest so I can send credentials possible?

I have the following code which connects to my php server and retrieves data from it. The only thing is, I need to send the username and password securely from this webrequest to the PHP server. Looking at the docs for the webrequest class, there is a credentials property as well as a preauthenticate property. I'm assuming these are for the network credentials (all my users are in AD).
Is it possible to secure this post request with credentials or is this just a bad idea? I've also found SetBasicAuthHeader - I'll read up on this and see if it might help. All traffic will be on SSL from ASPX site to the PHP site
// variables to store parameter values
string url = "https://myphpserver.php";
// creates the post data for the POST request
string postData = "Username=" + username + "&Password=" + "&UID=" + UniqueRecID;
// create the POST request
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(url);
webRequest.Method = "POST";
webRequest.ContentType = "application/x-www-form-urlencoded";
webRequest.ContentLength = postData.Length;
// POST the data
using (StreamWriter requestWriter2 = new StreamWriter(webRequest.GetRequestStream()))
{
requestWriter2.Write(postData);
}
// This actually does the request and gets the response back
HttpWebResponse resp = (HttpWebResponse)webRequest.GetResponse();
string responseData = string.Empty;
using (StreamReader responseReader = new StreamReader(webRequest.GetResponse().GetResponseStream()))
{
// dumps the HTML from the response into a string variable
responseData = responseReader.ReadToEnd();
}
SetBasicAuthHeader is for HTTP Basic Access Authentication so won't help here as you're handling authentication at application level. Really, this is no more insecure than just going to the page in a browser. I see you're using SSL so your request will be encrypted anyway and you have nothing to worry about.
If you're concerned for some other reason (although I can't think why), it sounds like you have control over the PHP end so you could just encrypt the password and add an extra POST parameter so the server knows to decrypt it.
When using HTTPS your data is safe in the message and transport scope. It means no one can decode it or sniff the packets. I suggest you read this article HTTPS Wiki

Logging in to eBay using HttpWebRequest fails due to 'The browser you are using is rejecting cookies' response

I'm trying to log in to my eBay account using the following code:
string signInURL = "https://signin.ebay.com/ws/eBayISAPI.dll?co_partnerid=2&siteid=0&UsingSSL=1";
string postData = String.Format("MfcISAPICommand=SignInWelcome&userid={0}&pass={1}", "username", "password");
string contentType = "application/x-www-form-urlencoded";
string method = "POST";
string userAgent = "Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)";
CookieContainer cookieContainer = new CookieContainer();
HttpWebRequest req = (HttpWebRequest)HttpWebRequest.Create(signInURL);
req.CookieContainer = cookieContainer;
req.Method = method;
req.ContentType = contentType;
req.UserAgent = userAgent;
ASCIIEncoding encoding = new ASCIIEncoding();
byte[] loginDataBytes = encoding.GetBytes(postData);
req.ContentLength = loginDataBytes.Length;
Stream stream = req.GetRequestStream();
stream.Write(loginDataBytes, 0, loginDataBytes.Length);
stream.Close();
HttpWebResponse res = (HttpWebResponse)req.GetResponse();
StreamReader xsr = new StreamReader(res.GetResponseStream());
String responseText = xsr.ReadToEnd();
Obviously substituting my real username and password. When I look at the string responseText, I see that part of the response from eBay is
The browser you are using is rejecting cookies.
Any ideas what I'm doing wrong?
P.S. And yes, I am also using the eBay API, but this is for something slightly different than what I want to do with the API.
You're doing a direct http request. The Ebay site has functionality to talk to a browser (probably to store the session cookie). Unless you make the request code smart enough to use cookies correctly it won't work. You'll probably have to use the internet explorer object instead.
Before doing the POST you need to download the page with the form that you are submitting in your code, take the cookie they give you, put it in your CookieContainer (making sure you get the path right) and post it back up in your request.
To clarify, while you might be POSTing the correct data, you are not sending the cookie that needs to go with it. You will get this cookie from the login page.
You need to intercept the http traffic to see what exactly what had happened. I use Fiddler2. It is the good tools for debugging http. So I can know whos wrong, my application or the remote web server.
Using fiddler, you can see the request header, response header with its cookies as well as response content. It used in the middle of your app and the Ebay.
Based on my experience. I think it is because Ebay cookie sent to you is not send back to Ebay server. Fiddler will prove it whether yes or not.
Another thing, the response cookie you receive should be send back to next request by using the same CookieContainer.
You should notice that CookieContainer has a bug on .Add(Cookie) and .GetCookies(uri) method. You may not using it, but internal codes might use it.
See the details and fix here:
http://dot-net-expertise.blogspot.com/2009/10/cookiecontainer-domain-handling-bug-fix.html
CallMeLaNN

you must use a browser that supports and has JavaScript enabled - c#

Am trying to post using HTTPWebrequest and this is the response i keep getting back:
you must use a browser that supports and has JavaScript enabled
This is my post code:
HttpWebRequest myRequest = null;
myRequest = (HttpWebRequest)HttpWebRequest.Create(submitURL);
myRequest.Headers.Add("Accept-Language", "en-US");
myRequest.Accept = "image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, application/xaml+xml, application/vnd.ms-xpsdocument, application/x-ms-xbap, application/x-ms-application, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*";
myRequest.Method = WebRequestMethods.Http.Post;
myRequest.Headers.Add("Accept-Language", "en-US");
myRequest.Accept = "*/*, text/xml";
myRequest.ContentType = "application/x-www-form-urlencoded" + "\n" + "\r";
myRequest.CookieContainer = cookieContainer;
myRequest.Headers.Add("UA-CPU", "x86");
myRequest.Headers.Add("Accept-Encoding", "gzip, deflate");
//cPostData section removed as submitting to SO
myRequest.ContentLength = cPostData.Length;
myRequest.ServicePoint.Expect100Continue = false;
StreamWriter streamWriter = new System.IO.StreamWriter(myRequest.GetRequestStream());
streamWriter.Write(cPostData);
streamWriter.Close();
HttpWebResponse httpWebResponse = (HttpWebResponse)myRequest.GetResponse();
StreamReader streamReader = new System.IO.StreamReader(httpWebResponse.GetResponseStream());
string stringResult = streamReader.ReadToEnd();
streamReader.Close();
how do i avoid getting this error?
It is difficult to say what the exact problem is because the server that is receiving you request doesn't think it is valid.
Perhaps the first thing to try would be to set the UserAgent property on your HttpWebRequest to some valid browser's user agent string as the server may be using this value to determine whether or not to serve the page.
This doesn't have anything to do with your code - the web server code has something that detects or relies on Javascript. Most likely a piece of Javascript on the page fills out (or modifies prior to posting) some hidden form field(s).
The solution to this is entirely dependent on what the web server is expecting to happen with that form data.
This is a layman's answer, and not a 100% technically accurate description of the httpWebRequest object, and meant that way bcause of the amount of time it would take to post it. The first part of this answer is to clarify the final sentence.
The httpWebRequest object basically acts as a browser that is interacting with web pages. It's a very simple browser, with no UI. it's designed basically to be able to post to and read from web pages. As such, it does not support a variety of features normally found in a browser these days, such as JavaScript.
The page you are attempting to post to requires javascript, which the httpWebRequest object does not support. If you have no control over the page that the WebRequst object is posting to, then you'll have to find another wat to post to it. If you own or control the page, you will need to modify the page to strip out items that require javascript (such as Ajax features, etc).
Added
I purposely didn't add anything about specifying a user-agent to try to trick the web server into thinking the httpWebRequest object supports JavaScript. because it is likely that the page really needs to have JavaScript enabled in order for the page to be displayed properly. However, a lot of my assumptions prove wrong, so I would agree with #Andrew Hare and say it's worth a try.

Categories