Web Scraper via Web Service API? - c#

How would I go about doing the following...
I want to build a web service for my application to grab a piece of data from an external website, that requires the user to login. The website has no public API , hence the reason for the scraper.
Is there a library to perform the following functions? or what do I do?
automate fill-in form, auto click
Automate submit button
check which URL the user has landed
on, and redirect user to URL
Grab data from label.
EDIT: what im asking for is there a web service, library etc to make it easier to perform screen scraping/automation functions???

Instead of filling a form and virtually clicking buttons, you should look at the source of the form, and figure out how the data is being submitted. In most cases you can simply send a post request with the log in data. If there is something special besides a simple post request, I use this addon to figure out what requests are being done that you can't see. Using C#, I would use the HttpWebRequest class because it handles cookies for you.

If the website does not ban robots, you can use YQL to simulate everything you need. However, it can be a bit difficult or impossible as you basically have to implement a text-only browser within JS.

Related

How to dialogue with an SSO service and open URL

I'm a JS/jQuery developer who's dipping his toe for the first time in the C#/.NET world with a new web service.
There's an external SSO service that I need to communicate with. I send it a URL with some query string parameters, it replies with a URL that includes an SSO token, then I need to pop that URL into the end user's browser.
Any pointers on how to do this with C#?
Some additional info... I'm trying to modify some existing code that sort of did something similar using System.Net.HttpWebRequest/HttpWebResponse and an HTML form with hidden inputs, but I'm a bit lost trying to make sense of what the code is doing, and anyway it uses a form with POST, whereas the SSO service I'm connecting to just uses query string parameters.
Just to close the loop on this, I resolved it by... using JavaScript. Basically the C# code injects some JS to the page to perform the redirect. Probably not be the best way to do this, but it worked...

How to: Encrypt URL in WebBrowser Controls

I have a program that opens a web browser control and just displays a web page from our server. They can't navigate around or anything.
The users are not allowed to know the credentials required to login, so after some googling on how to log into a server I found this:
http://user_name:password#URL
This is 'hard coded' into the web browsers code. -It works fine.
HOWEVER: Some smart ass managed to grab the credentials by using WireShark which tracks all the packets sent from your machine.
Is there a way I can encrypt this so the users cannot find out?
I've tried other things like using POST but with the way the page was setup, it was proving extremely difficult to get working. -(Its an SSRS Report Manager webpage)
I forgot to include a link to this question: How to encrypt/decrypt the url in C#
^I cannot use this answer as I myself am not allowed to change any of the server setup!
Sorry if this is an awful question, I've tried searching around for the past few days but can't find anything that works.
Perhaps you could work around your issue with a layer of indirection - for example, you could create a simple MVC website that doesn't require any authentication (or indeed, requires some authentication that you fully control) and it is this site that actually makes the request to the SSRS page.
That way you can have full control over how you send authentication, and you need never worry about someone ever getting access to the actual SSRS system. Now if your solution requires the webpage to be interactive then I'm not sure this will work for you, but if it's just a static report, it might be the way to go.
i.e. your flow from the app would be
User logs into your app (or use Windows credentials, etc)
User clicks to request the SSRS page
Your app makes an HTTP request to your MVC application
Your MVC application makes the "real" HTTP request to SSRS (eg via HttpClient, etc) and dumps the result back to the caller (for example,it could write the SSRS response via #HTML.Raw in an MVC View) The credentials for SSRS will therefore never be sent by your app, so you don't need to worry about that problem any more...
Just a thought.
Incidentally, you could take a look here for the various options that SSRS allows for authentication; you may find some method that suits (for e.g Custom authentication) - I know you mentioned you can't change anything on the server so I'm just including it for posterity.

Service to ASP.NET Communication

A little background... I have a .NET webpage that communicates one way with a service. (using OnCustomCommand()) When the user presses a button, a function is called. Which is all good and dandy, however when the function is done executing I need to be able to send a message, function call, or some communication to the .NET webpage.
Is there a way for my service to call a function, send message or update my .Net webpage?
I've looked around and seen mostly .NET -> Service but nothing seems to go the other way.
EDIT: Its a windows service, and the ASP page and WindowsService reside on the same server.
Have the service write the output to a common area... such as a shared file, or a database. Then refresh the webpage and have it query that file for the response output.
Support more than one user you should have have some session ID that will be used to determine where the output is saved. For example, call a command line parameter with a GUID like this:
Echo This is a test > c:\Some Directory\Session12345.txt
And then have your aspx page query and refresh using a GET like this http://example.com/GetOutput.aspx?Session=12345
From there use ASP to access a file with an appended SessionID in the URL.
You can extend this concept to work with JQuery and WCF as needed. Of course, you will need to add security to this to prevent MITM attacks. But it sounds like this is a small project not connected to the internet so the extra features may not be that important.
Communication can only be done from client to server. Use Ajax/webservice/scriptmethod for retrieving status of service call.

Send HTTP Post with default browser with C#

I am wondering if it is possible to send POST data with the default browser of a computer in C#.
Here is the situation. My client would like the ability to have their C# application open their browser and send client information to a webform. This webform would be behind a login screen. The assumption from the application side is that once the client data is sent to the login screen, the login screen would pass that information onto the webform to prepopulate it. This would be done over HTTPS and the client would like this to be done with a POST and not a GET as client information would be sent as plain text.
I have found some wonderful solutions that do POSTS and handle the requests. As an example
http://geekswithblogs.net/rakker/archive/2006/04/21/76044.aspx
So the TL;DR version of this would be
1) Open Browser
2) Open some URL with POST data
Thanks for your help,
Paul
I've handled a similar situation once by generating an HTML page on the fly with a form setup with hidden values for everything. There was a bit of Javascript on the page so that when it loaded, it would submit the form, therefore posting the data as necessary.
I suspect this method would work for you.
Generate a dictionary of fields and values
Generate an HTML page with the Javascript to automatically submit when page is loaded
Write page to a temp location on disk
Launch default browser with that page
Remember though that POST data is sent plaintext as well. POST is generally the way to go for more than a couple fields, as you can fit in more data (2048 byte limit on URLs) and that your user has a friendly URL to see in their browser.
Nothing is sent as plain text when you use SSL, it is encrypted. Unless you set what the default browser is (IE, Firefox, Chrome, etc), then you'll have to figure out what the default browser is and use its API to do this work (if it's possible).
What would probably be must faster and more efficient would be to open the default browser by invoking a URL with Start Process and pass the information on the query string (this is doing a GET instead of a POST, which I know isn't what you're asking for).
The response from the server could be a redirect, and the redirect could send down the filled-out form (storing the values in session or something similar).
That way the complexity is pushed to the website and not the windows application, which should be easier to update if something goes wrong.
HTH
Can you compile your logic in C# and then call it from PowerShell? From PowerShell you can very easily automate Internet Explorer. This is IE only but you might be able to also use WaitnN.
Anything you put at the end of the URL counts as the querystring, which is what GET fills. It is more visible than the POSTed data in the body, but no more secure with regard to a sniffer.
So, in short, no.

C# Forms authentication with form data

I have two applications, say app. A and app. B. App A. sends form data (using the post method) to app B. B however, is a web application and uses forms authentication. The post data is send to a webpage (viewdocument.aspx) which is secured by forms authentication. But when the data is send to viewdocument, the login page is displayed because the user isn't authenticated.
The point is, I want the post data to be read by viewdocument. How can I do this?
You can allow all users to access your viewdocument page (by setting authorization in your web.config), get the values of the post in your page load and then, manually do:
if (!User.Identity.IsAuthenticated)
FormsAuthentication.RedirectToLoginPage();
//Else continue with page display
This way, you will protected the display of your page but be able to send data to the page with any user.
I hope it will help
If your web app is only for accept data use web-services.
I think you want to consider separating out the two process - accepting data from another web site, and displaying data to a user. This way the you get nice separation of logic which can improve maintainability. And I'm not sure how you are going to go POSTing data from one website to another as POST should go back to the original webpage. I would do as #Kane suggested in his comment and use a service to accept the incoming data. This could be built to accept the current data, but would also be easily extensible if you ever need to receive data from other sites. Your page for displaying the data would then be a lot more simple and clearer for developers to work on.

Categories