Monitoring changes in web application

Monitoring changes in web application - c#

I want to monitor changes in background in complex web application. This is one-page application with many scripts and so on. I need to be logged in to have access to data I want to monitor.
I tried to use webrequest, but I think that the application is to complex to do it that way. There is also a problem with authentication.
I also tried WebBrowser component, but web application is telling me, that this browser is too old and I should get newer one.
Perfect solution would:
Open this web application in chrome (or some other modern browser) in background
Save the page to memory
Extract values using something like HtmlAgilityPack
While this will be happening I want to normally use the computer (so opening chrome window is not a good solution for me).
Is there any way to achieve something like that?

if you can cope with an extra browser running, have a look at SeleniumHQ. with its webdriver-backed selenium you can start a dedicated browser instance and perform user actions by coding in high-level programming languages like java. it should not interfere your manual work at all, but will take up the same amount of memory and cpu time your "real" browser would.
if the web application has no captcha and does not object to automated script accessing it, you could also login in a background program by sending appropriate HTTP requests and parse the response. python's urllib2 would be my first choice.
if you dont want any additional processes running, you could also create a browser plugin, that autorefreshs and parses a certain open tab every few seconds.

Related

Increasing the security when using WebBrowser

I've looked through several similar questions on SO but haven't found something quite like what I need, so my question is this:
I want to take a screenshot (thumbnail) of a URL after the user provides one. I was going to use Awesomium because they provide a fairly simple solution for screengrabs. Unfortunately, Awesomium won't compile in an x64 application, and since I'm building this with ASP.NET for Windows Azure, I can't switch to x86.
So I'm left with a less-elegant solution, using Windows.Forms WebBrowser to load the url and take the screenshot (as shown here: http://www.codeproject.com/Articles/95439/Get-ASP-NET-C-2-0-Website-Thumbnail-Screenshot ).
Ugly, I know, but it works with most pages (there is the occasional white screenshot), but now I'm concerned with security.
If the user inputs a malicious URL and the WebBrowser loads it, what is to stop it from running harmful code and downloading a virus to the server where the app is hosted?
There are several services and websites that offer similar functionality, albeit with different approaches, but the core idea is the same: the site must open up the URL and render the page in order to grab the screenshot. So what kind of measures would one expect them to take to thwart viruses and malicious URLs?

The biggest threat to your application would be client script executing in your browser control (i.e. JavaScript and client-side VBScript). It appears it is not possible to disable JavaScript programmatically in the WebBrowser object:
VB.NET WebBrowser disable javascript
Disable javascript in
WinForms WebBrowser control?
Stripping <script> tags in the first question's first answer is not the way to go for security, as there are so many other ways script can get inserted.
Changing window.alert in the second answer won't work as it needs the page to load fully first, and it is possible for script to execute before then. Also, this would only stop the alert function and not prevent script code in any other way.
Changing the registry settings as suggested in this answer may be the way to go, but this appears to be the same as changing Internet Explorer settings to high security for the internet zone (or selecting custom and disabling Active Scripting). If you are always in control of the machine where your app is loaded from, then manually disabling scripting in Internet Explorer options could be a viable solution.
Most client-side internet threats such as drive-by downloads involve script in some way, so this approach will go a long way in protecting your app.
However, there are other exploits such as the Windows Metafile vulnerability that can harm a client machine.
Viewing a website in a web browser that automatically opens WMF files, in which case any potential malicious code may be automatically downloaded and opened. Internet Explorer, the default Web browser for all versions of Microsoft Windows since 1996, does this.
However, making sure your machines are patched with the latest Windows Updates will secure you against threats like these. This will leave zero-day attacks against Internet Explorer or the WebBrowser object, which you will not be able to do much about. I would suggest running your app on an isolated machine (or VM) which would then upload the screenshot to another server (e.g. via the web) which would help mitigate threats in this scenario.

Client side communication with Windows application

I have a web-based picking/packing solution for delivering orders (asp.net/c#). Orders are marked as packed in the browser and then immediately the label information is added to our database, ready for the next part...
The label printing is done via a Windows application (written in C#) and was done this way because I couldn't find a way of getting the browser to print the label automatically (i.e. without the user having to click Print/OK, etc.)
The problem:
The Windows application polls every 10 seconds (subject to change) to see if there are any new labels for that picker/packer. Now, if I could get the browser to communicate with the label application then the polling would be unnecessary, since the picker/packer would have just clicked "Ready to Ship" and the label data would be created.
The data that is pulled down by the polling process isn't vast, but I'm concerned that as we add more picker/packer stations the polling process could have a knock on effect to the web server/database (since all stations would be polling). Also, pickers/packers don't want to wait around waiting for labels, so extending the polling time isn't possible (if anything I'd like it as quick as possible)
Solutions?
So, ideally, I'd like a way of communicating between the browser and the application (if possible). Or any method that removes the need for polling. Perhaps something akin to Comet, that allows the server to send a message to the application when a new label is added.
Ideally, a solution that wouldn't require a specific browser. But this may be asking too much.
A long-term solution would be to move the web-based picking-packing solution into the label application, but that would be a lot of work!
I hope that's clear and not too wordy. Let me know if I can add any other details in here. Thanks in advance.
Edit
Am looking into websockets as an idea. Any advice will be more than welcome!
Update
Thanks for all comments. I've now got a few ideas on how to solve the problem:
Websockets. May be problematic with firewall issues since I don't have easy access to the system (geographical distance)
Read browser cookies from the application. Possible solution http://www.codeproject.com/Articles/330142/Cookie-Quest-A-Quest-to-Read-Cookies-from-Four-Pop. This covers all the browsers that are in use in the warehouse. I can poll the local cookie values and see if any new labels have been created, then download them. Therefore no polling on the database server.
ActiveX control. Limited to IE and perhaps there'd be some security/setup issues with installing this on each PC.
Leave the code as is. Gauge whether the load on the database server is too much or ok.

You could create a local WebSocket server in your C# application and then make the browser connect to it and send the data you need to print.
I'm not sure, though, that this is what you need. As I see it you need to pass graphical data to your application, which could be really tricky to do using only javascript.

The appropriate way to achieve the communication between a web application and a desktop application would be to go through a server both apps talk to.
You can get any web-server (e.g. node.js nodejs.org that will let you use the same javascript you use for the web-app on the server) and interact with it. How you talk with the server from the desktop app depends on its technology. However all languages have some way to do http communications like SOAP.
Or you can try to make:
Both apps talk to the server using socket.io. You can borrow code from the following project.

Create an MSMQ (or a queue implementation of your choice) and host a WCF service in your windows application that polls the MSMQ.
Have your ASP.NET application write any relevant information to this queue so that the WCF service in the windows app that pulls this information will know what to make of it and print your labels.
The reason I mention a queue is for reliability, if your windows app goes down for any reason, the queue will at least be preserved and waiting for you to bring the windows application back up.
Although there is a bit of polling involved, it is very quick and almost neglibible. Implementing it is automatic with NetMsmqBinding, it's all taken care of. All you need to do is configure it.
If you go for a non-MSMQ queue, then I don't know whether you can still use NetMsmqBinding, you may have to create your own.

I'm not sure, but it seems like your application is polling a filesystem for these new labels to print? Have you considered using a FileSystemWatcher in your application? You can set that to watch a directory and be notified of anything new.

C# Webbrowser Automation

Background:
I am creating a Windows Form App that automates order entry on a intranet Web Application. We have a large amount of order entry that needs to be done that will cost us a lot of money so I volenteered to automate the task.
Problem:
I am using the webbrowser class to navigate the web app. I have gotten very far but reached a road block. There is a part in the app that opens a web dialog page. How do I interact with the web dialog. My instance of the webbrowser class is still with the parent page. I am hoping someone can point me in the right direction.

You've got a number of options. To expand on the answers from others and add a new idea...
Do it using the webbrowser control: This is technically possible by either injecting javascript into the target page as demonstrated here or creating a JavaScript object and using it as a bridge via the webbrowser.objectforscripting property. This is very fragile - something as simple as the website changing an element's Id could break it. You also need to make sure your code doesn't interfere with the functioning of the form (clashing function names, etc...)
Do it using a postback: Monitor the communications between the web browser and the server (I personally prefer Firfox/Firebug but ie/Fiddler or Chrome/F12 are both good too). As long as you can replicate the actions of the browser exactly, the server can't know the difference. The problem here is that browsers are complex and the more secure a form is, the more demanding servers are. This means you may have to fake a login, get cookies, send them back on subsequnt requests, Handle Viewstate data and xss prevention variables. It's possible and it's far more robust than the first option but can be a pain to get working. If it's not a highly secure form,, this is your best bet. More information here
Do it by browser automation: Selenium is probably the best option here (as mentioned by others) but suffers from a similar flaw to the webbrowser control in that it's sensitive to changes on the form itself (But not as mcuh so as the webbrowser control).
Incidentally, if you have Visual Studio Ultimate/Test edition (and some others, not sure which), it includes a suite of testing tools including an excellent engine to automate load testing a website. This is also superb for tracking down what exactly a form does as you can see every step of the emulation.
Hope this helps

You have two choices depending of the level of complexity you need:
Use a HTTP Debugger like Fiddler to find out the POST data you
need to send to each page and mimic it via a HttpWebRequest.
Use a Browser Automation Tool like Selenium and do the job.
NOTE: Your action may be considered as spamming by the website so be ready for IP blocking, CAPTCHA...

You could give Selenium a go: http://seleniumhq.org/
UI automation is a far more intuitive approach to these types of tasks.

I want to programmatically launch a browser with a URL, wait for it to finish, then close the browser

I have created an Html 5 page that provides important server-side functionality. Unfortunately, it must be run in an Html 5 browser (Chrome, IE9, or Firefox) with a canvas to produce the results I need. It is completely self contained, taking needed parameters through the URL, and is ready to be closed when the OnLoad event is ready to send. So far so good.
The following process needs to be automated (no human eyes or interaction) and will be run from within a web service (not run from within a browser). Ideally, I don't want to waste extra cycles with busy wait, or delay the result by waiting for long time periods simply hoping the process has finished. I need to:
Open a browser (preferably Chrome) with a URL, using C#.
Wait for the page to completely finish loading - ideally receiving a callback of some kind.
Close the browser page when finished, again with C#.
We've tried using IE9. There is C# support to launch IE9, Wait until not Busy, and gracefully Close the browser; however, the page loads resources asynchronously (there is no way around this), and so we get the signal that it is no longer busy during the resource load - instead of when the page has finished. Adding busy wait would consume valuable server-side cpu cycles.
A simple Create Process call would be nice, but would only work if the browser could close itself with some html - but thanks to security measures in the browsers, I can't find a reliable way to use html commands to close a browser that was launched from command-line (I did see you can close tabs spawned from an already opened page - firefox only, but this doesn't help).
Does anyone know how I can accomplish this goal? Again - there is no human involvement in any of the process, no human eyes will ever see the page or interact with it in any way. The page only runs on the server machine, and will never be deployed to a client machine.

I would suggest to use the WebBrowser control to load the HTML. Once you get the data back, use an ObjectForScripting to call a c# method to notify when done.
See http://www.codeproject.com/Tips/130267/Call-a-C-Method-From-JavaScript-Hosted-in-a-WebBro
You dont really have to even show the webbrowser control.
Let me know if you have any questions. Hope it helps!

Automating the browser - thats what Selenium does. I think it will be a good fit for the task, and there's good C# support. It can even run the browser on a remote machine using the Selenium RC server.

Best method for Website Automation?

Let me rephrase the question...
Here's the scenario: As an insurance agent you are constantly working with multiple insurance websites. For each website I need to login and pull up a client. I am looking to automate this process.
I currently have a solution built for iMacros but that requires a download/installation.
I'm looking for a solution using the .NET framework that will allow the user to provide their login credentials and information about a client and I will be able to automate this process for them.
This will involve knowledge of each specific website which is fine, I will have all of that information.
I would like for this process to be able to happen in the background and then launch the website to the user once the action is performed.

You could try the following tools:
StoryTestIQ
Selenium
Watir
Windmill Testing Framework
Visual Studio Web Tests
They are automated testing tools/frameworks that allow you to write automated tests from a UI perspective and verify the results.

Use Watin. It's an open source .NET library to automate IE and Firefox. It's a lot easier than manipulating raw HTTP requests or hacking the WebBrowser control to do what you want, and you can run it from a console app or service, since you mentioned this wouldn't be a WinForms app.
You can also make the browser window invisible if needed, since you mentioned only showing this to the user at a certain point.

I've done this in the past using the WebBrowser control inside a winforms app that i execute on the server. The WebBrowser control will allow you to access the html elements on the page, input information, click buttons/links, etc. It should allow you to accomplish your goal.
There are ways to do this without the WebBrowser control, look at the HTML Agility Pack.

Assuming that you are talking about filling and submitting a form or forms using a bot of some sort then scraping the response to display to the user.
Use HttpWebRequest(?) to create a form post containing the relevant form fields and data from your model and submit the request.
Retrieve and analyse the response, store any cookies as you will need to resubmit the cookie on the next request.
Formulate the next request based on the results of the first request ( remembering to attach cookies as necessary ) and submit it.
Retrieve the response and display or parse and display ( depending on what you are hoping to achieve ).
You say this is not a client app - therefore I will assume a web app. The downside of this is that once you start proxying requests for the user, you will have to always proxy those requests as there is no way for you to transfer any session cookies from the target site to the user and there is no ( simple / easy / logical ) way for the user to log in to the target site and then transfer the cookie to you.
Usually when trying to do this sort of integration, people will use some form of published API for interacting with the companies / systems in question as they are designed for the type of interactions that you are referring to.

It is not clear to me what difficulty you want to communicate when you wrote:
I currently have a solution built for
iMacros but that requires a
download/installation.
I think here lies some your requirements about which you are not explicit. You certainly need to "download/install" your .Net program on your client's machines. So, what's the difference?
Anyway, Crowbar seems promising:
Crowbar is a web scraping environment
based on the use of a server-side
headless mozilla-based browser.
Its purpose is to allow running
javascript scrapers against a DOM to
automate web sites scraping but
avoiding all the syntax normalization
issues.
For people not familiar with this terminology: "javascript scrapers" here means something like an iMacros' macro, used to extract information from a web site (in the end is a Javascript program, for what purpose you use it I do not think makes a difference).
Design
Crowbar is implemented as a (rather
simple, in fact) XULRunner application
that provides an HTTP RESTful web
service implemented in javascript
(basically turning a web browser into
a web server!) that you can use to
'remotely control' the browser.
I don't know if this headless browser can be extended with add-ons like a normal Firefox installation. In such case you can even think to use yours iMacros' macros (or use CoScripter) with appropriate packaging.
The more I think about this, more I feel that this is a convoluted solution for what you wrote you want to achieve. So, please, clarify.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.