Load and Use a Web Cookie - c#

I am writing a small web scraper in C#. It is going to use HttpWebRequest to get the HTML file, find the required data, then report it back to the caller.
The required data is only available when a user is logged in. As I am new to interfacing programmatically with http, Javascript, et al, I am not going to try and log on programmatically. The user will have to log on to the website, and my program will get the stored cookie, and load it into the CookieContainer for the http request.
I've done enough research to know that the data belongs in the CookieContainer (I think), but I can't seem to find anywhere an example of how to find a cookie created by IE (or firefox, or chrome, etc), load it programmatically, populate the CookieContainer, and send it with an http get request. So how does one do all that?
Thanks!

I'm afraid you can't do that. Main reason is security. Because of cookies being used to identify a user, browser can't provide an easy access to cookies. Otherwise it would be really easy to still them.
You should better learn how to login user with HttpWebRequest or any other class like that.

Related

login website like(FB, TWITTER ) and crawl data with c#?

I am creating a console application in c#(visual studio).
but i don't know where to start.
1st i want to login(phantomjs or selenium)>>then go to a (specified)website URL and extract html?
i want to know how to save login information in my web request.
thank you.
Long story short, it's not easy to do that just with web request because each site has its own way of managing cookies and security.
It's easier if you use a web browser control to login first. From there, the browser can obtain a valid cookie and you can start crawl data from there.
I've done a similar thing with Chegg website. For details, you can check out my repository https://github.com/hungqcao/chegg-solutions-saver
In your case, it can get a little complicated since FB, Twitter may have 2-factor authentication or something similar to that but the idea stays the same.
Let me know if you need help.

How to: Encrypt URL in WebBrowser Controls

I have a program that opens a web browser control and just displays a web page from our server. They can't navigate around or anything.
The users are not allowed to know the credentials required to login, so after some googling on how to log into a server I found this:
http://user_name:password#URL
This is 'hard coded' into the web browsers code. -It works fine.
HOWEVER: Some smart ass managed to grab the credentials by using WireShark which tracks all the packets sent from your machine.
Is there a way I can encrypt this so the users cannot find out?
I've tried other things like using POST but with the way the page was setup, it was proving extremely difficult to get working. -(Its an SSRS Report Manager webpage)
I forgot to include a link to this question: How to encrypt/decrypt the url in C#
^I cannot use this answer as I myself am not allowed to change any of the server setup!
Sorry if this is an awful question, I've tried searching around for the past few days but can't find anything that works.
Perhaps you could work around your issue with a layer of indirection - for example, you could create a simple MVC website that doesn't require any authentication (or indeed, requires some authentication that you fully control) and it is this site that actually makes the request to the SSRS page.
That way you can have full control over how you send authentication, and you need never worry about someone ever getting access to the actual SSRS system. Now if your solution requires the webpage to be interactive then I'm not sure this will work for you, but if it's just a static report, it might be the way to go.
i.e. your flow from the app would be
User logs into your app (or use Windows credentials, etc)
User clicks to request the SSRS page
Your app makes an HTTP request to your MVC application
Your MVC application makes the "real" HTTP request to SSRS (eg via HttpClient, etc) and dumps the result back to the caller (for example,it could write the SSRS response via #HTML.Raw in an MVC View) The credentials for SSRS will therefore never be sent by your app, so you don't need to worry about that problem any more...
Just a thought.
Incidentally, you could take a look here for the various options that SSRS allows for authentication; you may find some method that suits (for e.g Custom authentication) - I know you mentioned you can't change anything on the server so I'm just including it for posterity.

Remote logging into a website and fetching HTML source with WPFs WebBrowser class

I was trying to write a application which logs the user on a specific website after he inputs his account information and then present an specific site in the window which is only accessible after login.
I'm trying to do this with the WebBrowser Class from System.Windows.Controls.WebBrowser
However, even after searching other examples I can't seem to get past the login.
I used HttpFox to analyze the GET and POST data and found out that Cookies Sent are: _utma/b/c/z, clientid, csrftoken and sessionid and received sessionid.
Ok now I know that the _utma cookies are something about google analytics so I think I can ignore them? The csrftoken seems to have the same value always.
Can anyone give me some hints how to make the POST request in c# with the webbrowser class?
Help is very much appreciated, thanks! :)
update1: I already know the general methods I have to use but I'm having problems with the actual implementation. What should I include in the post request and how to get and save the sessionId,... things like that. I couldn't find any working example where someone is logging into some 3rd party website with the help of the WebBrowser class.
You could use WebClient.UploadData method to post data and to receive response from your script

Crawl site and detect 3rd party cookies

I am writing a crawler to log all cookies being deployed by a set number of sites. I can pick up 1st party cookies being set on page visit using selenium, but a limitation in the software means that it won't pick up 3rd party cookies. Are there any other tools which are available which can do pick all cookies?
Thanks.
If you are doing this as a one-time task, you can use something like the FireCookie extension to the Firefox browser, which lets you export all the cookies:
http://www.softwareishard.com/blog/firecookie/
If you want to automate this task and run it periodically, consider a solution like the following:
First get a list of pages that need to be crawled.
Then load each page consecutively into a web browser. It's not enough to simply fetch the HTML of the page, because you need to load and process all javascript, iframes, and so forth that might set cookies. It could probably be a headless browser such as PhantomJS ( http://www.phantomjs.org/ ) or some other solution as long as it actually renders the page like a browser would do.
Use a web proxy such as Charles proxy ( http://www.charlesproxy.com/ ) to record all the network requests from the browser. The recorded session can be saved and processed to extract all the cookie headers. Charles proxy has an API that can be used to export the session to an XML file, so you might be able to automate this part as well.
I believe you could use RegEx and ie.GetCookie() to collect all the cookies from a website. Haven't tried it myself but as far the documentation goes I think it'll be rather easy.

Easiest way to get web page source code from pages that require logins -- C#

So I play an online game that's web based and I'd like to automate certain things with it using C#. Problem is that I can't simply use WebClient.DownloadData() because I need to be logged in to actually recieve the source. The other alternative was to use the built-in web browser control but that doesn't give me access to source code. Any suggestions?
I don't think NetworkCredentials will work in all cases. This only works with "Basic" or "Negotiate" authentication.
I've done this before with an internal website for some load testing, but sounds like you are trying to "game" the game. For that reason I won't go into details but the login to the site is probably being done in the form of an HTTP POST when you hit the login button.
You'd have to trap the POST request and replicate it in your code and make sure that your implementation maintains the session state as well, because if the game site is written well at all it will make sure that the current session has logged in before doing anything game related.
You can set the login credentials on the webclient using its Credentials property before calling DownloadData:
WebClient client = new WebClient();
client.Credentials = new NetworkCredential("username", "password");
EDIT: As mjmarsh points out, this will only work for sites that use a challenge-response method of authentication as part of a single request (I'm so used to dealing with this at work, I hadn't considered the other types!). If the site uses forms authentication (or indeed any other form of authentication), this method will not work as the authentication is not part of a single request - multiple requests are needed that you will need to handle yourself.
Network credentials will not work as mjmarsh has already pointed out.
While web scraping we come across lot of pages where login is needed. One of the approaches I use is install fiddler and monitor the POST and GET packets while manually logging in the site. This allows you to find out how the browser emulates the login. Then you need to recreate the same process by Code.
For example, most web servers use cookies to assume the session is authenticated. So you can use the credentials to post UserName and Password on the web site and record the Cookie. This cookie can then be used to access any further details on the web site.
Please check following link to check out more about Advanced Web Scraping:
http://krishnan.co.in/blog/post/Web-Scraping-Yahoo-Mail.aspx
In this blog, you will find how to authenticate into Yahoo account and then read the page after authentication.

Categories