I'm trying to make an extended version of a WebBrowser with stuff like highlighting text and getting properties or attributes of elements for a Web Scraper. WebBrowser functions doesn't help much at all, so if I could just find a way from HtmlElement to a JavaScript element (like the one returned by document.getElementById), and back, and then add JavaScript functions to the HTML from my application, it would make the job a lot easier. Right now I'm messing with the HTML of the code programmatically from C# and it's very messy. I was thinking about setting some unique Id to each HTML element from my program and then call the JavaScript document.getElementById to retrieve it. But that won't work, they might already have an Id assigned and I will mess up their HTML code. I don't know if I can give them some made up attribute like my_very_own_that_i_hope_no_web_page_on_the_world_ever_uses_attribute and then figure out if there is some JavaScript function getElementByWhateveAttributeIWant but I'm not sure if this would work. I read something about expansion or extended attributes on the DOM documentation in msdn but I'm not sure what that is about. Maybe some of you guys have a better way.
It would be much easier to use some rendering engine like trident to get the data from html document. Here is the Link for trident/MSHTML. you can do google and can have samples in c#
This is not nearly as hard as you imagine. You don't have to modify the document at all.
Once the WebBrowser has loaded a page, it's kept internally as a tree with the document node at the root. This node is available to your program, and you can find any element you want (or just enumerate them all) by walking the tree.
If you can give a concrete example, I can supply some code.
Related
I'm in the very early stages of attempting to automate data entry and collection from a website. I have a 16,000 line CSV file. For each line, I'd like to enter data from that line into a textarea on a webpage. The webpage can then perform some calculations with that data and spit out an answer that I'd collect. Specifically, on the webpage http://www.mirbase.org/search.shtml, I'd like to enter a sequence in the sequence text box at the bottom, press the "Search miRNAs" button and then collect results on the next page.
My plan as of right now is to use a C# WebBrowser. My understanding is that I can access the individual elements in the HtmlDocument either by id, name or coordinate. The last option is not ideal, because if I distribute this program to other people I can't be sure they'd be using at the same coordinates. As for the other 2 options, the textarea has a name, but it's the same as the form name, so I don't know how to access it. The button I'd like to click has neither a name nor an id.
Does anyone have any ideas as to how to access the elements I need? I am by no means set on this method, so if there's an easier/better way I'm certainly open to suggestions.
The WebBrowser class is not designed for this, hence why you are coming up with your problems.
You need to look into a tool that is designed for web automation.
Since you are using C#, Selenium has a wonderful set of C# bindings, and it can solve your problems because you'll be to use different locators (locating an element by a CSS selector or XPath specifically).
http://docs.seleniumhq.org/
Check mshtml - Mshtml on msdn
You can use it with the WebBrowser object.
Add Microsoft.mshtml reference to your project and the using mshtml declaration in your class.
Using mshtml you can easily set and get elements properties.
I am trying to automate the testing of web forms. To that end I need to know how to use C# to dynamically locate input tags within the HTML page then assign values to them. I don't want to use XPath, because each time I will be using a different web form. I want to pass the web form's URL to Selenium and then automatically populate the fields. I've heard of HTMLAgilityPack. Would that help me? If so, how can I use it?
I appreciate your help.
I may have missed a crucial part of your question, however, have you looked at Selenium WebDriver?
If you write a test that handles a generic web form you can back your test by data that is dynamic. Therefore you can cater for changes in the page by using Data Driven Tests. I've written tests for many pages and there are always common actions, but I cater for each page differently though as there are different things on that page!
[EDIT]
Following on from your comments, I think looking into Selenium would be a good idea. The way to handle different pages is to have these element definitions ready in a 'definitions' class for each page. That way once you know what the page is, you just use the correct class for your definitions. It is best to know what elements you are going to be interacting with in your tests before the tests run. The point of automated UI testing is for a known set of actions to be performed and a correct result achieved.
I would suggest you look up some tutorials such as this and you can see my blog
though I wrote this when I was initially learning WatiN and then replaced it with Selenium (I like it better :P).
Html Agility Pack
This is an agile HTML parser that builds a read/write DOM and supports
plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor
XSLT to use it, don't worry...). It is a .NET code library that allows
you to parse "out of the web" HTML files. The parser is very tolerant
with "real world" malformed HTML. The object model is very similar to
what proposes System.Xml, but for HTML documents (or streams).
HtmlDocument doc = new HtmlDocument();
doc.Load(path);
foreach (HtmlNode input in doc.DocumentNode.SelectNodes("//input"))
// Your Code...
I'm trying to build a headless browser in c#. c# has plenty of classes, which are supposed to make this possible, like, for example JScriptCodeProvider.
I am looking to get IE XML DOM classes for the JavaScript code to work with. Can anyone tell me where to find those, and, if possible, to provide me with a workable example for what I'm trying do to?
Use the webbrowser control. That should get you everything you need.
I wish to read the two Combobox lists from this site:
http://coinmill.com/
Then I wish to recreate that in C# 2010 Combobox.
Thank you.
Use the HtmlAgilityPack to download and parse the page at coinmill. Write an xpath expression or traverse the DOM to find the select tag you´re looking for in the HtmlAgilityPack document. Get the list of options, and load them into your list.
Mind you: this code can (and probably will) break every time coinmill updates their website. Your xpath or dom traversing is basing itself on the current layout of that page, and it will change. You would be better off looking for a webservice that offers the data you are looking for.
Menno
Think "Firebug", but entirely from C#.
I have a WebBrowser control that I've built a DOM tree for in a TreeView. I'd like to be able to set a link between each DOM element in the TreeView and its matching HtmlElement in the WebBrowser's Document so that when the node in the tree is clicked, the matching element in the Document highlights.
But, of course, the only availability on the surface for element access is GetElementById(), GetElementFromPoint() and GetElementsByTagName(). And, of course, not all web pages have Id's or Names associated with them. And since in my app's user experience the user won't be clicking the WebBrowser, but the TreeView, I don't have access to a Point either.
I'm experimenting with various options I've found in the API now. But it would be great if anyone out there has experience in this area. I can't seem to find detail on the web anywhere.
Thanx ahead of time!
Personally, I would utilize a JavaScript library such as JQuery to perform such a task. This library is easy to use and plenty of examples/plug-ins available (http://jquery.com). Using JQuery allows you not only to use IDs but also grab them by CSS class, anchor type, etc. Essentially, anything you can pull from HTML/CSS you can pull with JQuery.
If you would like to handle a HtmlElement from the code-behind you essentially have to assign it an ID as well as specify the RUNAT attribute. For example:
<textarea id="bodyText" runat="server"></textarea>
Hopefully this helps in some way!
Have you tried the All property?