How to screen scrape a site with dialog based authentication? - c#

I want to use Jsoup to screen scrape contents of a website. But I have to login first into the site. On browsing to the main page I get a dialog for username and password. Since it is not a form, I am getting the "not authorized" page as the response in in Jsoup. I tried to look for the url using firebug but I guess the dialog is appearing before the other page components are loaded. Hence I don't know what are the parameters I need to pass for the username and password fields nor do I get to know the service where I need to post.
This is a C# based website. I have seen this authentication mechanism in several Sharepoint sites.
How should I go ahead with this kind of login mechanism ?

Sounds like the page is using basic authentication. This happens before any HTML is sent to the client, so that's why you don't see it in firebug.
You need to send the username and password in a http-header, and here's a link that shows you how to do that:
Jsoup connection with basic access authentication

Related

C# How to open web page, identify element, input data and wait for next page?

This question probably exist in different forms but I would need to get explained to me how to accomplish the following...
I'm working on a windows forms application (C#). When I click a button on the form I want to navigate to a specific page (all in code behind), find an input[type=text] on that page by id or class, input a password, and click on the login button next to the input.
Then I need to wait for the page that will load after the login button is clicked before I continue identifying more elements. F.e I want to find a html table and traverse it.
If someone could give me a good example and tell me if I need any additional controls in my form I would be most grateful.
Now, as I wrote above, I'm not interested in opening a browser and navigating to that page. I want it all to take place in the code so to speak..
Thanks in advance!
You don't need to scrape the website and find the input of type=text. Forms works with GET or POST requests. Login form is generally a POST request to the server, you should search for the form inside that page and see where it points the action. Let's say it is done this way:
<form action="login.php" method="post">
So you know that login.php will handle the request and that it's using the post method.
Now you should write some C# code to send a POST request to http://yoururl.com/login.php (Please see HttpWebRequest).
Once you get that, since it's a login, you should find a way to keep cookies active so that you can send another request to the page you have to access after the login. Keeping cookies active means that you're logged and your session is active with the user you logged in the previous POST request.
To achieve this part you should have a look to HttpWebRequest.CookieContainer.
Once you get your cookies you should now send a GET request to the next page where you can then scrape the information you need. The GET request to a web page send you the whole html page as response. You should then use a scraping library such as HttpAgilityPack to get the table you need.
Try to write some code and come back when you face a problem, opening another question. I hope I provided you some useful information!

how to login to facebook with c# without the facebook page

I am pretty new at programming. I saw many examples on how to c# login to facebook, all of them are using the facebook login page to login. my question is how do I pass the username and the password form my own text box I create and my own login button. is it possible ?
Pretty sure you can't do this.
The only way to authenticate with Facebook is via a Facebook login page.
If Facebook allowed login via 3rd party forms, there would be nothing to stop malicious sites from collecting Facebook login details.
It really depends on what you are trying to accomplish. There are some integration API's for Facebook if you have a site that you want to integrate with Facebook.
The only way I've found to login directly from a desktop app, is by using a WebBrowser class, requesting the facebook login page, then populating the username/password fields programmatically:
webBrowser1.Document.GetElementById("textName").SetAttribute("value") = "ddddd" ;
The result of the login will produce several cookies that identify the session, and you can transfer those into a HttpWebRequest so that you requests will be seen as logged in.
Like Rik said, this is a very bad practice because you'd have to collect user details through your own form, (although you could always display the WebBrowser window directly to the user, you are still asking them to do something very risky because the application hosting the WebBrowser class can intercept the login information).

URL rewrite in ASP.NET application

How do I redirect url based on register client in c# .net or asp.net 4.0. For example if client registers as "client1" and our website is www.mycompany.com for every page client proceeds should get www.client1.mycompany.com.
More detailed example:
For example another client created is Client2. The pages i have created in general is like
"www.mycompany.com/product.aspx"
"www.mycompany.com/categories.aspx" should be shown as
"www.client2.mycompany.com/product.aspx" and
"www.client2.mycompany.com/categories.aspx" respectively
I have searched on web and found for static pages or using Gloabal.asax during startup of application but haven't found any thing after user logged in.
I have done something similar before in a few sites and there are a couple methods you could use. Assuming that you have a url setup so that all subdomains ( *.url.com) will send any user to your server and you have IIS setup to handle them all (i.e. no host header required, just IP) in the same site you can use one of the following methods:
After login simply send the user to that url. Since .Net won’t care the url the server knows how to render it, then it should be that simple. This assumes all your navigation uses relative paths and you must enable cookie sharing for that domain. This is required if the cookie for login was give on 1.url.com and you send them to 2.url.com You can share cookies in the same domain, requires a little work, but can be done.
Create a generic login page that does a web service request back to the server to see if the user can login. If he or she can have it send back to the browser a command, along with the correct url, that tell the clients browser to post directly to that sites login page (send username, password). This will login them into their site and assign the cookies correctly all from one simple login page. You could even make an external login page that only exists for this purpose. In the end all the generic page did was see if they could login and the sent their credentials to the correct page that did the login. I recommend this be done in a post with ssl for security reasons.
I hope that makes since.
There's a project called UrlRewritingNet which I use - it's pretty old but the source is available so you could recompile it for 4.0.
Link is at http://urlrewriting.net/149/en/home.html

Login by pass using C#.net program

My goal is to provide a link to customer, when a customer clicks the link he/she should be automatically logged in to this new site(external website not controlled by our company) using their logged in AD credentials.
FYI, the logged in credentials match the login name on this external website and the password will be the same for all. So, I can safely hardcode the password in my program.
Now what I was thinking was to write a C# program that will complete the authentication process for the External website and returns back the page that is received after login.
My analysis:-
1) When I first visit the page http://website2/default.aspx, it returns back a login page with username, password and submit button.
I also noticed that it is returning a session id.
*ASP.NET_SessionId=i0j3d155mxxkuyr3fedp00yf*.
2) Later, when I enter username, password and click the submit button.
It is creating a query string as such
user=adf&password=adsf&buttonName=Login+%21
I think it is using an HTTP Post call.
Can you please help me this!!!! Please, if possible provide me with a code that I can refer to and make changes to acheive this.....
Thanks a lot for looking....Any help is appreciated.....
You're not going to be able to do this since there is no way to retrieve a password from AD. The only way this would be possible is if there was a trust relationship between the external site and yours and it sounds like this is not the case (please correct me if I'm wrong!).
You'll need to have the user manually enter authentication and marshal that information to the external site, which is really no better than just having the user authenticate directly to the external site.
On a side note, it's rarely the case where you can safely hardcode a password in a program. Just sayin. Also, if you're getting a query string that you can see in the browser's address bar then it's not using POST, it's using GET.

c# Auto login to sprint

I have a project where I am trying to login to sprint and then do some screen scraping to get data about the different lines that the company controls. I have tried passing the cookies that are provided by the initial website call in the initial HttpWebRequest form post, but I do not get any cookies back that will denote user or session or anything. In fact, if I then try to use the WebClient class to get the landing page, the response url that I get back is the login page.
I think it is due to the fact that when you login, you get redirected to a page that does some processing and then redirects you to the landing page. I am passing in correct credentials and don't know where it is failing. Can anyone help me so that I do not need to use Watin or any other browser control to scrape that data as that will be too slow.
Use Selenium.
It is normally for website testing, but you easily use it for your situation.
It allows you to launch a browser and programmatically control mouse clicks and keyboard presses to do exactly what you need.
You also run xpath on the HTML to read data, or even run custom javascript on pages if you need to get more complicated.

Categories