Web automation using C# WebBrowser - c#

I'm in the very early stages of attempting to automate data entry and collection from a website. I have a 16,000 line CSV file. For each line, I'd like to enter data from that line into a textarea on a webpage. The webpage can then perform some calculations with that data and spit out an answer that I'd collect. Specifically, on the webpage http://www.mirbase.org/search.shtml, I'd like to enter a sequence in the sequence text box at the bottom, press the "Search miRNAs" button and then collect results on the next page.
My plan as of right now is to use a C# WebBrowser. My understanding is that I can access the individual elements in the HtmlDocument either by id, name or coordinate. The last option is not ideal, because if I distribute this program to other people I can't be sure they'd be using at the same coordinates. As for the other 2 options, the textarea has a name, but it's the same as the form name, so I don't know how to access it. The button I'd like to click has neither a name nor an id.
Does anyone have any ideas as to how to access the elements I need? I am by no means set on this method, so if there's an easier/better way I'm certainly open to suggestions.

The WebBrowser class is not designed for this, hence why you are coming up with your problems.
You need to look into a tool that is designed for web automation.
Since you are using C#, Selenium has a wonderful set of C# bindings, and it can solve your problems because you'll be to use different locators (locating an element by a CSS selector or XPath specifically).
http://docs.seleniumhq.org/

Check mshtml - Mshtml on msdn
You can use it with the WebBrowser object.
Add Microsoft.mshtml reference to your project and the using mshtml declaration in your class.
Using mshtml you can easily set and get elements properties.

Related

Creating a custom pop up for Umbraco back office

[Final EDIT] Here is a link to the code I wrote in case it helps anyone.
I think I have a solution. Umbraco uses asp.net files for their popups
which is something I haven't used yet but I think I can get the hang
of it. I don't know how to access the aspx from within my class,
should I make a code behind partial class?
Thanks for any help.
I am developing a multi-lingual site, using Umbraco, where content nodes are automatically copied to each language as they are created. Is there any way to implement a custom popup to confirm that it should be copied to all instead?
This wouldn't actually be on the site, rather in the back office.
Or is it possible to open a browser popup with c# as all I really need is a bool value from a message box?
[EDIT: added possible solution]
I sorted this by adapting Umbraco's own create function. I made a new .aspx file and added the functionality that I needed to the code behind.
I was able to add a context menu item that allowed me to call my new page and from there called a method to duplicate the content.
From the method, I pass the new node and get the parent id. Then I compare all the node names for those that match and use the umbraco document.copy() method to recreate the content under each language at the correct position.
If I can make the code more generic then I will upload it as a package to Umbraco.

Can you Create a Keynote using the Revit API in C#

I'm trying to create a Keynote Tag via the Revit 2012 API. However, I found now reference to creating a Keynote Tag anywhere on the internet or in the samples. I see that the BuiltInCategory.OST_KeynoteTags is part of the IndependentTag class and according to http://thebuildingcoder.typepad.com/files/guide-to-placing-family-instances-with-the-api.doc you need to use the TM_ADDBY_CATEGORY TagMode to create a Keynote. However, when you then try to change the new Tag via ChangeTypeId, you get an error.
Has anyone figured this out?
I haven't had a chance to try yet, but I'm thinking you're out of luck.
For the most part, you can't do things with the API that you can't do interactively in Revit. I did quickly test that you can't change the type of a multi-category tag to be a keynote tag.
While they're both IndependentTag elements, they are different Categories, and it's very rare in my experience where you can switch the category of a placed element.

How can I get a handle or object reference to an HtmlElement from a WebBrowser control HtmlDocument?

Think "Firebug", but entirely from C#.
I have a WebBrowser control that I've built a DOM tree for in a TreeView. I'd like to be able to set a link between each DOM element in the TreeView and its matching HtmlElement in the WebBrowser's Document so that when the node in the tree is clicked, the matching element in the Document highlights.
But, of course, the only availability on the surface for element access is GetElementById(), GetElementFromPoint() and GetElementsByTagName(). And, of course, not all web pages have Id's or Names associated with them. And since in my app's user experience the user won't be clicking the WebBrowser, but the TreeView, I don't have access to a Point either.
I'm experimenting with various options I've found in the API now. But it would be great if anyone out there has experience in this area. I can't seem to find detail on the web anywhere.
Thanx ahead of time!
Personally, I would utilize a JavaScript library such as JQuery to perform such a task. This library is easy to use and plenty of examples/plug-ins available (http://jquery.com). Using JQuery allows you not only to use IDs but also grab them by CSS class, anchor type, etc. Essentially, anything you can pull from HTML/CSS you can pull with JQuery.
If you would like to handle a HtmlElement from the code-behind you essentially have to assign it an ID as well as specify the RUNAT attribute. For example:
<textarea id="bodyText" runat="server"></textarea>
Hopefully this helps in some way!
Have you tried the All property?

Any way to associate a HtmlElement (.NET) to a JavaScript element?

I'm trying to make an extended version of a WebBrowser with stuff like highlighting text and getting properties or attributes of elements for a Web Scraper. WebBrowser functions doesn't help much at all, so if I could just find a way from HtmlElement to a JavaScript element (like the one returned by document.getElementById), and back, and then add JavaScript functions to the HTML from my application, it would make the job a lot easier. Right now I'm messing with the HTML of the code programmatically from C# and it's very messy. I was thinking about setting some unique Id to each HTML element from my program and then call the JavaScript document.getElementById to retrieve it. But that won't work, they might already have an Id assigned and I will mess up their HTML code. I don't know if I can give them some made up attribute like my_very_own_that_i_hope_no_web_page_on_the_world_ever_uses_attribute and then figure out if there is some JavaScript function getElementByWhateveAttributeIWant but I'm not sure if this would work. I read something about expansion or extended attributes on the DOM documentation in msdn but I'm not sure what that is about. Maybe some of you guys have a better way.
It would be much easier to use some rendering engine like trident to get the data from html document. Here is the Link for trident/MSHTML. you can do google and can have samples in c#
This is not nearly as hard as you imagine. You don't have to modify the document at all.
Once the WebBrowser has loaded a page, it's kept internally as a tree with the document node at the root. This node is available to your program, and you can find any element you want (or just enumerate them all) by walking the tree.
If you can give a concrete example, I can supply some code.

Searching the name of web pages according to the word entered in a textbox

I have a textbox and a button in one page.I want to enter a word in the textbox and click the button. After clicking the button I want to display the name of the web pages containing the word entered in the textbox. So please tell me how to do it? I am using C#.
So you want to create a search engine internal to your website. There are a couple of different options
You can use something like google custom search which requires no coding and uses the google technology which I think we all agree does a pretty good job compared to other search engines. More information at http://www.google.com/cse/
Or you can implement it in .net which I will try to give some pointers about below.
A search engine in general exists out of (some of) the following parts:
a index which is searched against
a query system which allows searches to be specified and results shown
a way to get documents into the index like a crawler or some event thats handled when the documents are created/published/updated.
These are non trivial things to create especially if you want a rich feature set like stemming (returning documents containing plural forms of search terms), highlighting results, indexing different document formats like pdf, rtf, html etc... so you want to use something already made for this purpose. This would only leave the task of connecting and orchestrating the different parts, writing the flow control logic.
You could use Lucene.net a opensource project with a lot of features. http://usoniandream.blogspot.com/2007/10/tutorial-implementing-lucenenet-search.html explains how to get started with it.
The other option is Microsoft indexing service which comes with windows but I would advice against it since it's difficult to tweak to work like you want and the results are sub-optimal in my opinion.
You are going to need some sort of backing store, and full text indexing. To the best of my knowledge, C# alone is not enough.

Categories