Can the PageSource property in Selenium be updated as JavaScript loads data? - c#

I'm trying to determine if there's specific text on the page. I'm doing this:
public static void WaitForPageToLoad(this IWebDriver driver, string textOnPage)
{
var pageSource = driver.PageSource.ToLower();
var timeOut = 0;
while (timeOut < 60)
{
Thread.Sleep(1000);
if (pageSource.Contains(textOnPage.ToLower()))
{
timeOut = 60;
}
}
}
The problem is that the web driver's PageSource property isn't updated after the initial load. The page I'm navigating to loads a bunch of data via JS after the page has already loaded. I don't control the site, so I'm trying to figure out a method to get the updated HTML.

You are trying to solve the wrong problem. You need to wait for the text to appear using an XPath locator:
var wait = new WebDriverWait(driver);
var xpath = $"//*[contains(., '{textOnPage}')]";
wait.Until(ExpectedConditions.ElementIsVisible(By.XPath(xpath));

Do you really need to search entire page?
I'll reference you to here: https://stackoverflow.com/a/41223770/1387701
with this code:
String Verifytext= driver.findElement(By.tagName("body")).getText().trim();
You can then check to see if the Verifytext contains the string you're checking for.
This works MUCH better if you can narrow the location of the text down to a particular webElement other than the body.

Related

Scrape data from div in Windows.Form

I am new in c# programming. I am trying to scrape data from div (I want to display temperature from web page in Forms application).
This is my code:
private void btnOnet_Click(object sender, EventArgs e)
{
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
HtmlWeb web = new HtmlWeb();
doc = web.Load("https://pogoda.onet.pl/");
var temperatura = doc.DocumentNode.SelectSingleNode("/html/body/div[1]/div[3]/div/section/div/div[1]/div[2]/div[1]/div[1]/div[2]/div[1]/div[1]/div[1]");
onet.Text = temperatura.InnerText;
}
This is the exception:
System.NullReferenceException:
temperatura was null.
You can use this:
public static bool TryGetTemperature(HtmlAgilityPack.HtmlDocument doc, out int temperature)
{
temperature = 0;
var temp = doc.DocumentNode.SelectSingleNode(
"//div[contains(#class, 'temperature')]/div[contains(#class, 'temp')]");
if (temp == null)
{
return false;
}
var text = temp.InnerText.EndsWith("°") ?
temp.InnerText.Substring(0, temp.InnerText.Length - 5) :
temp.InnerText;
return int.TryParse(text, out temperature);
}
If you use XPath, you can select with more precission your target. With your query, a bit change in the HTML structure, your application will fail. Some points:
// is to search in any place of document
You search any div that contains a class "temperature" and, inside that node:
you search a div child with "temp" class
If you get that node (!= null), you try to convert the degrees (removing '°' before)
And check:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
HtmlWeb web = new HtmlWeb();
doc = web.Load("https://pogoda.onet.pl/");
if (TryGetTemperature(doc, out int temperature))
{
onet.Text = temperature.ToString();
}
UPDATE
I updated a bit the TryGetTemperature because the degrees are encoded. The main problem is the HTML. When you request the source code you get some HTML that browser update later dynamically. So the HTML that you get is not valid for you. It doesn't contains the temperature.
So, I see two alternatives:
You can use a browser control (in Common Controls -> WebBrowser, in the Form Tools with the Button, Label...), insert into your form and Navigate to the page. It's not difficult, but you need learn some things: wait to events for page downloaded and then get source code from the control. Also, I suppose you'll want to hide the browser control. Be carefully, sometimes the browser doesn't works correctly if you hide. In that case, you can use a visible Form outside desktop and manage activate events to avoid activate this window. Also, hide from Task Window (Alt+Tab). Things become harder in this way but sometimes is the only way.
The simple way is search the location that you want (ex: Madryt) and look in DevTools the request done (ex: https://pogoda.onet.pl/prognoza-pogody/madryt-396099). Use this Url and you get a valid HTML.

Keep getting stuck loading when ScrapySharp NavigateToPage

My browser just keeps loading when navigatetopage using scrapysharp and won't go to the next line of code. Below is my code using c# asp.net web form. May I know why? The link I use is working and can manually browse. The code just gets stuck at the Browser.NavigateToPage(new Uri("http://www.asnb.com.my/v3_/asnbv2_0index.php")); and keep loading in the browser. And I am using asp.net webform.
ScrapingBrowser Browser = new ScrapingBrowser();
Browser.AllowAutoRedirect = true;
Browser.AllowMetaRedirect = true;
WebPage PageResult = Browser.NavigateToPage(new Uri("http://www.asnb.com.my/v3_/asnbv2_0index.php"));
HtmlNode TitleNode = PageResult.Html.CssSelect(".navbar-brand").First();
I was having the same problem and decided not to use Browser.NavigateToPage and instead get the PageResult.Htmlusing an HtmlDocument.
For example:
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load("http://www.asnb.com.my/v3_/asnbv2_0index.php");
HtmlNode TitleNode = doc.DocumentNode.CssSelect(".navbar-brand").First();
This should get you your expected results.
Move your call to a backgroundworker thread. Notice that in line 353 in ScrapingBrowser.cs (ScrapySharp/ScrapySharp/Network/ScrapingBrowser.cs), the call to NavigateToPage() calls the Async version:
public WebPage NavigateToPage(Uri url, HttpVerb verb = HttpVerb.Get, string data = "", string contentType = null)
{
return NavigateToPageAsync(url, verb, data, contentType).Result;
}
I had the same problem, as soon as I moved the call to my DoWork method in my BGW thread, it starts behaving the way you expect.
Another method would be to use the async version of the NavigateToPage eg:
private async Task<WebPage> LoadPage(Uri uri)
{
WebPage page = await browser.NavigateToPageAsync(uri);
return page;
}

Highlight text using Selenium

I have a context sensitive menu that needs text to be hightlighted in order for it to work. However, I'm having problems with selecting the text using Selenium. I start by finding the WebElement I'm looking for, before trying to interact with it using the different mouse events available.
When I'm trying to select parts of the text, it doesn't appear to do anything other than placing the marker at the end of the string.
This is what my textbox looks like:
This is what I need it to look like, or in other words, what I need Selenium to select (Just did it manually for the purpose of illustration:
This is along the lines of what I'm trying to do in code:
public static async Task HighlightElementByCssSelector(this RemoteWebDriver #this, string cssSelector, TimeSpan? timeout = null, TimeSpan? interval = null)
{
var element = await #this.FindElementByCssSelectorAsync(".testmarker-registryentryedit .testmarker-title-field");
Actions action = new Actions(#this).MoveToElement(element).ClickAndHold(element).MoveByOffset(10,0).Release();
action.Build().Perform();
}
#this represents the Driver in this case, and the FindElementByCssSelectorAsync being part of a 'wrapper-framework'.
I've tried different combinations of MoveToElement, DragAndDrop, ClickAndHold etc, but I just can't get it to work. Any pointers as to what might be wrong here?
According to what I understood about your problem I tried to solve it on my local machine (first day of vacation, lol). Sorry, I don't have VS on that machine so I wrote it in Java. The code should be self-explanatory:
#org.junit.Test
public void doTest(){
String query = "This is a test";
WebDriver driver = new FirefoxDriver();
driver.get("https://www.google.com");
WebElement searchBox = new WebDriverWait(driver, 10).until(ExpectedConditions.visibilityOfElementLocated(By.id("lst-ib")));
searchBox.sendKeys(query);
// make sure it has focus
searchBox.click();
Actions actions = new Actions(driver);
// go to the beginning of input
actions.sendKeys(Keys.HOME).build().perform();
int length = query.substring(0, query.indexOf("a")).length();
actions.keyDown(Keys.LEFT_SHIFT);
for (int i = 0; i < length; i++){
actions.sendKeys(Keys.ARROW_RIGHT);
}
actions.keyUp(Keys.LEFT_SHIFT);
actions.build().perform();
}
Results in:
Is this what you wanted?

Redirecting Virtual User to Content Editor Programatically

I am trying to create a Virtual user with and redirect to content editor as below.
string userId = string.Format("{0}\\{1}", "sitecore", "testadmin");
var scUser = AuthenticationManager.BuildVirtualUser(userId, true);
scUser.RuntimeSettings.IsAdministrator = true;
scUser.RuntimeSettings.AddedRoles.Add(#"sitecore\Sitecore Client Authoring");
AuthenticationManager.Login(scUser);
string url = "/sitecore/shell/sitecore/content/Applications/Content Editor.aspx?id=%7b110D559F-DEA5-42EA-9C1C-8A5DF7E70EF9%7d&la=en&fo=%7b110D559F-DEA5-42EA-9C1C-8A5DF7E70EF9%7d";
url = string.IsNullOrEmpty(url) ? "/" : url;
HttpContext.Current.Response.Redirect(url, false);
But it always redirects the user to sitecore/login page.
Any idea what is the issue here?
Interesting. I'm not entirely sure that approach is a supported scenario. However, the Content Editor runs off the "shell" website, possibly that is your issue.
Try putting this code around your entire code block.
using(new SiteContextSwitcher("shell")) {
}
You need to change:
AuthenticationManager.Login(scUser);
to
AuthenticationManager.LoginVirtualUser(scUser);

How to get XML-code of webpage that is opened in IE (without using WebRequest)?

I'm trying to get an XML-text from a wabpage, that is already opened in IE. Web requests are not allowed because of a security of target page (long boring story with certificates etc). I use method to walk through all opened pages and, if I found a match with page's URI, I need to get it's XML.
Some time ago I needed to get an HTML-code between body tags. I used method with IHTMLDocument2 like this:
private string GetSourceHTML()
{
Regex reg = new Regex(patternURL);
Match match;
string result;
foreach (SHDocVw.InternetExplorer ie in shellWindows)
{
match = reg.Match(ie.LocationURL.ToString());
if (!string.IsNullOrEmpty(match.Value))
{
mshtml.IHTMLDocument2 doc = (mshtml.IHTMLDocument2)ie.Document;
result = doc.body.innerHTML.ToString();
return result;
}
}
result = string.Empty;
return result;
}
So now I need to get a whole XML-code of a target page. I've googled a lot, but didn't find anything useful. Any ideas? Thanks.
Have you tried this? It should get the HTML, which hopefully you could parse to XML?
Retrieving the HTML source code

Categories