Selenium to read the content from website

Selenium to read the content from website - c#

I have some contents like this in a webpage.
Appname Description Price Part Number Validity
App1 some desc1 25 JH32 30
App2 some desc2 250 PB36 180
App2 some desc3 20 QL76 10
App3 some desc4 50 KQ3J 30
My application is like after starting the app, user will enter an app name in which selenium will search for that particular appname in this site. What I want beyond this step is that:
Whatever appname I search for, selenium must retrieve the values corresponding to that fields such like selenium should retrieve the values of Price, Validity and Part Number fields. I tried selenium to retrieve the value by using attributes like classname, tagname, id etc. But all the fields have the same attribute for each of these fields, which makes selenium confusing to select the field.
Only thing I could find different is innertext which I can't use here since I can't predict what the user will give as the appname in the searchbox at the start of my application.
My sample html code, which I got once I clicked a field(price) in my site. I used firebug for this. and i am using firefox browser for selenium..
<td height="100%" class="ms-vb-title"><table height="100%" cellspacing="0" surl="" uis="512" cid="0x0100DFF86ACBE51BE549AA56639FCC32D7E0" ctype="Item" ms="0" csrc="" hcd="" couid="" otype="0" icon="icgen.gif||" ext="" type="" perm="0x1b03c4312ef" dref="sites/SoftwareDev/IAG/IAS/Lists/Unify Parts" url="/sites/SoftwareDev/IAG/IAS/Lists/Unify%20Parts/27_.000" id="27" ctxname="ctx1" onmouseover="OnItem(this)" class="ms-unselectedtitle"><tbody><tr><td width="100%" class="ms-vb"><a target="_self" onclick="GoToLink(this);return false;" href="/sites/SoftwareDev/IAG/IAS/Lists/Unify%20Parts/DispForm.aspx?ID=27" onfocus="OnLink(this)">MindMeister - 251-500 Pupil License<img height="1" border="0" width="1" alt="Use SHIFT+ENTER to open the menu (new window)." class="ms-hidden" src="/_layouts/images/blank.gif"></a></td><td><img width="13" alt="" style="visibility:hidden" src="/_layouts/images/blank.gif"></td></tr></tbody></table></td>
How can I achieve what I said?Any comments would be really appreciated..

Without seeing the HTML or your code, I would assume that the table you provided is set up in an ordered <table><tr><td>...</td><td>...</td></tr></table> form. Under this assumption, you could just use the code you are already using to find the appname, and then use the webdriver's Xpath/Jquery (based on which browser you're using) locating ability to find the appropriate data value by <td> index of the <tr> you found the appname element to be a child of.
When you have the element "selected", just return the label value and save it in a variable in your code.

If all you have to go on is the app name, then you need to do it like this:
Identify the WebElement which is the table cell (td) contains the app name.
Use a relative xpath to identify the next table cell along.
Extract the text from that table cell.
Repeat steps 2-3 all along the row.
Here is some simplified webdriver code to show in principle how it works:
WebElement td_appname = driver.findElement.ByLinkText("MindMeister - 251-500 Pupil License");
WebElement td_appdescription = td_appname.findElement.Byxpath("./../td[2]");
String appdescription = td_appdescription.getText();
Here, the crucial bit is that xpath on the third line of code. That will only work on a very simple table structure (table/tbody/tr/td); but yours is a lot more complicated, so you will need to work out the appropriate relative xpath for your website's structure. I recommend using a good browser's developer tools (e.g. on firefox, use firebug and firepath) to take a good look at the DOM tree and figure out what kind of xpath you will need to go from one cell to the next.

Related

Finding CSS selector path for Selenium C#

I am new to Selenium C# automation. Tried finding on web but did not get any help.
The html code looks like this. I need to find the element and then click it using CSS. The site only runs on IE.
<tbody>
<tr class="t-state-selected">
<td>Purchased</td>
<td class="">768990192</td>

I know web links can disappear, but here are a few I use when trying to figure out how to locate elements using Selenium's C# WebDriver:
https://automatetheplanet.com/selenium-webdriver-locators-cheat-sheet/
https://saucelabs.com/resources/articles/selenium-tips-css-selectors
https://www.packtpub.com/mapt/book/web_development/9781849515740/1
The bottom line is that you're selecting by id, class, or XPath. Each of these can be tested directly on the page using the F12 browser tools. For example, to find the first comment on your question above, you could try this in the console:
$x("//div[#id='mainbar']//tbody[#class='js-comments-list']/tr")
Here's another SO post with a quick and dirty answer.
And here is the official documentation from Selenium on how to locate UI elements.

To click on the number 768990192 which is dynamic we have to construct a CssSelector as follows :
driver.FindElement(By.CssSelector("tr.t-state-selected td:nth-of-type(2)")).Click();

You're really not giving us much info to work. I will try my best to accommodate. Even though the presented HTML is not enough to give an indication of the format and you've not presented any code of your current solution.
string url = "https://www.google.com";
IWebDriver driver = new InternetExplorerDriver();
driver.Navigate().GoToUrl(url);
driver.FindElement(By.XPath("//tr[#class='t-state-selected']")).Click();
This little code snippet.
Creates a internet explorer driver.
Goes to the url of your choice.
And then clicks the table row that has a class that equals "t-state-selected'. Which my guess is all or none of the table rows.

Selenium - Discerning Between Identical <articles>, C#

I'm trying to have my program sit on a webpage and wait for specific tagName within an article to appear. Problem is, I need Selenium to check the article contains two tagNames before clicking it, that's where I'm stumped. The way I have my code setup right now, it doesn't click anywhere. It just sits on the page, I suspect because there's more than one article with the same main tagName that I'm trying to find. Here's the HTML:
<article>
<div class ="inner-article">
<a href ="/shop/shirts/iycbmgtqw/x9vdawcjg" style="height:150px;">
<img alt="Xrtqh7ar444" height="150" src="//d17ol771963kd3.cloudfront.net/120885/vi/xrTQH7Ar444.jpg" width="150">
</a>
<h1>
EXAMPLE_CODE
</h1>
<p>
EXAMPLE_COLOUR
</p>
</div>
</article>
All other items on this page have an identical class, and some have identical tagNames. I want to search for when there's a specific combination of two tagNames in an article. I realize xPath is an option, but I would like to code it before knowing an xPath, where the name of the item is the only available information.
And here's the code I'm working with at the moment:
driver.Manage().Timeouts().ImplicitlyWait(TimeSpan.FromMinutes(10));
IWebElement test = driver.FindElement(By.TagName(textBox12.Text));
test.Click();
where textBox12.Text is "EXAMPLE_CODE". Am I correct in assuming that WebDriver doesn't click anything because there is more than one element with the tagName "EXAMPLE_CODE", and is there a possible way to make it first look for "EXAMPLE_CODE" and then check the secondary: "EXAMPLE_COLOUR"?
Thanks!!

You are using By.TagName incorrectly. Tag refers to the type of element you are trying to find. In this case for the link it is 'a'. Or in case of a div it is 'div'. Te correct way of finding with tagname for a link would be - By.TagName("a").
You need to match text and you will need to use xpath. Assuming that the code is unique you should try.
XPath to get the code href -- //div[class='inner-article']/h1/a[.=EXAMPLE_CODE]
XPath to get the color href -- //div[class='inner-article']/h1/a[.=EXAMPLE_CODE]/following-sibling::a

Why is html rendered by PhantomJS incomplete with special characters?

When automating our tests for web (C#, Selenium WebDriver 2.53.0, PhantomJS 2.1.1), I cannot locate some of the elements. When I looked to the innerHTML of the parent element, I see, that html generated by PhantomJS is 1) incomplete and 2) contains special characters - see excerpt.
<div class=\"buttons\">
<table cellspacing=\"0\" cellpadding=\"0\" width=\"100%\" cl=""
ass=\"button_line\">
\r\n\t\t\t\t\t\t\t<tbody>
<tr>
\r\n\t\t\t\t\t\t\t\t<td></td>\r\n\t\t\t\t\t\t\t\t<td> </td>\r\n\t\t\t\t\t\t\t\t<td></td>\r\n\t\t\t\t\t\t\t\t<td> </td>\r\n\t\t\t\t\t\t\t\t<td></td>\r\n\t\t\t\t\t\t\t\t<td> </td>\r\n\t\t\t\t\t\t\t\t<td></td>\r\n\t\t\t\t\t\t\t\t<td> </td>\r\n\t\t\t\t\t\t\t\t<td width=\"100%\"> </td>\r\n\t\t\t\t\t\t\t
</tr>\r\n\t\t\t\t\t\t
</tbody>
</table>\r\n\t\t\t\t\t\t
</div>"
With chrome the html code is more complex - e.g. contains more child elements also for this table which is in the excerpt. Is PhantomJS too fast, so some of the code did not get the chance to be updated? (How to force this?) And why are there those characters? (How can I fix this?)
Update 1:
As suggested, this question might be answered by the How to wait for element to load in selenium webdriver?, however I think this is different issue. As I suggested in one of the comments, I'm already using ImplicitWait which works fine for all other browsers. I even tried Explicit wait and use it for parts problematic in PhantomJS, but to no avail :(.
Maybe to add more info: the problematic tests have something to do with javascript and refresh of the page in PhantomJS.
Usual scenario is like selecting row in some Overview table and then checking for values (fields) in details section (which was empty until then - no fields displayed).

C# - Get JavaScript variable value using HTMLAgilityPack

I currently have 2 JavaScript variables in which I need to retrieve values from. The HTML consists of a series of nested DIVs with no id/name attributes. Is it possible to retrieve the data from these variables using HTMLAgilityPack? If so how would I go about doing so, if not what would be required, regular expressions? If the latter, please help me in creating a regular expression that would allow me to do this. Thank you.
<div style="margin: 12px 0px;" align="left">
<script type="text/javascript">
variable1 = "var1";
variable2 = "var2";
</script>
</div>

I'm assuming you are trying to scrape this information from a website? Most likely one you don't have direct control over? There are several ways to do this, I'll go easy to hard( at least as I see em):
Ask the owner (of the site). Most of the time they can give you direct access to the information and if you ask nicely, they might just let you have it for free
You can use the webBrowser control, run the javascript and then parse values from the DOM afterwards. As opposed to HttpWebRequest, this allows for all the proper values to be loaded on the page and scraped. Helpful Link Here.
Steal the source with Firebug. Inspect the website with Firebug to see which URLs are called from the background. Most likely, its using an asynchronous request to retrieving the updated information from a webservice. Using Firebug, you can view this under the NET -> XHR. Look at the request and the values returned, you can then retrieve the values your self and parse the contents from the source rather than scrape the page.
I think this might be the information you were looking for, but if not let me know and I can clarify/fix answer

c# webbrowser show what you want

i want create small web browser , tiny and fast
but i have problem ,
let me explain :
1 - user enter site : google.com
2 - c# program get google.com
3 - find <td nowrap="" align="center">
4 - in web browser only show that area
i dont know where i must start ,
thanks

Ok, I'm going to try answer your question, but I am deciphering as well.
Create a WebBrowser control on your form. (2.0 is fine for what you need) and .Navigate("http://www.google.com");
Get the source code from the Document. You can do this as follows: string source = _WebBrowser.Document.Body.OuterHtml;
Use string manipulation to get to the area on the page you need. For instance .SubString() functions
Save the text into a file, or stream and load it into the WebBrowser control, or replace the pages Document HTML with just the HTML you are wanting to show.

Okay! Looking at the comment it seems you want to request for a page using c# and show only one part of the page. In your case its that specific <td> . Please correct me if I am wrong.
Other than what Kyle has mentioned. Check out HTML agility Pack. It might be of interest to you.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Selenium to read the content from website - c#

Related

Finding CSS selector path for Selenium C#

Selenium - Discerning Between Identical <articles>, C#

Why is html rendered by PhantomJS incomplete with special characters?

C# - Get JavaScript variable value using HTMLAgilityPack

c# webbrowser show what you want

Categories

Resources