Why is html rendered by PhantomJS incomplete with special characters?

Why is html rendered by PhantomJS incomplete with special characters? - c#

When automating our tests for web (C#, Selenium WebDriver 2.53.0, PhantomJS 2.1.1), I cannot locate some of the elements. When I looked to the innerHTML of the parent element, I see, that html generated by PhantomJS is 1) incomplete and 2) contains special characters - see excerpt.
<div class=\"buttons\">
<table cellspacing=\"0\" cellpadding=\"0\" width=\"100%\" cl=""
ass=\"button_line\">
\r\n\t\t\t\t\t\t\t<tbody>
<tr>
\r\n\t\t\t\t\t\t\t\t<td></td>\r\n\t\t\t\t\t\t\t\t<td> </td>\r\n\t\t\t\t\t\t\t\t<td></td>\r\n\t\t\t\t\t\t\t\t<td> </td>\r\n\t\t\t\t\t\t\t\t<td></td>\r\n\t\t\t\t\t\t\t\t<td> </td>\r\n\t\t\t\t\t\t\t\t<td></td>\r\n\t\t\t\t\t\t\t\t<td> </td>\r\n\t\t\t\t\t\t\t\t<td width=\"100%\"> </td>\r\n\t\t\t\t\t\t\t
</tr>\r\n\t\t\t\t\t\t
</tbody>
</table>\r\n\t\t\t\t\t\t
</div>"
With chrome the html code is more complex - e.g. contains more child elements also for this table which is in the excerpt. Is PhantomJS too fast, so some of the code did not get the chance to be updated? (How to force this?) And why are there those characters? (How can I fix this?)
Update 1:
As suggested, this question might be answered by the How to wait for element to load in selenium webdriver?, however I think this is different issue. As I suggested in one of the comments, I'm already using ImplicitWait which works fine for all other browsers. I even tried Explicit wait and use it for parts problematic in PhantomJS, but to no avail :(.
Maybe to add more info: the problematic tests have something to do with javascript and refresh of the page in PhantomJS.
Usual scenario is like selecting row in some Overview table and then checking for values (fields) in details section (which was empty until then - no fields displayed).

Related

Not recognizing <strong> tag in #Html.Raw in ASP.NET MVC C#

I am using ASP.NET MVC, when I want to use the tag in #Html.Raw, this tag does not appear in the desired <div>.
As shown here:
<div class="mt-4 current-cursor">
#Html.Raw("<strong>OKK</strong> <p><ul><li style='font-size:18px;'>1.Test1</li><li>2.Test2</li></p>")
</div>
The result that it displays for me is as below, that is, it does not recognize the <strong> tag at all.

Html.Raw does not interpret anything at all. It just spews the given string unencoded into the output docuument.
So if it doesn't look right in your case, possible you have some CSS in that page that causes it to look as it does. You could use F12 (Developer Tools, depending on your browser) to inspect the "OKK" for details.
BTW, the other tags in your example also look wrong (which could also be an issue given existing CSS in the page).
In my case, for example, using some (other) arbitrary styles, your code looks like this:

Finding CSS selector path for Selenium C#

I am new to Selenium C# automation. Tried finding on web but did not get any help.
The html code looks like this. I need to find the element and then click it using CSS. The site only runs on IE.
<tbody>
<tr class="t-state-selected">
<td>Purchased</td>
<td class="">768990192</td>

I know web links can disappear, but here are a few I use when trying to figure out how to locate elements using Selenium's C# WebDriver:
https://automatetheplanet.com/selenium-webdriver-locators-cheat-sheet/
https://saucelabs.com/resources/articles/selenium-tips-css-selectors
https://www.packtpub.com/mapt/book/web_development/9781849515740/1
The bottom line is that you're selecting by id, class, or XPath. Each of these can be tested directly on the page using the F12 browser tools. For example, to find the first comment on your question above, you could try this in the console:
$x("//div[#id='mainbar']//tbody[#class='js-comments-list']/tr")
Here's another SO post with a quick and dirty answer.
And here is the official documentation from Selenium on how to locate UI elements.

To click on the number 768990192 which is dynamic we have to construct a CssSelector as follows :
driver.FindElement(By.CssSelector("tr.t-state-selected td:nth-of-type(2)")).Click();

You're really not giving us much info to work. I will try my best to accommodate. Even though the presented HTML is not enough to give an indication of the format and you've not presented any code of your current solution.
string url = "https://www.google.com";
IWebDriver driver = new InternetExplorerDriver();
driver.Navigate().GoToUrl(url);
driver.FindElement(By.XPath("//tr[#class='t-state-selected']")).Click();
This little code snippet.
Creates a internet explorer driver.
Goes to the url of your choice.
And then clicks the table row that has a class that equals "t-state-selected'. Which my guess is all or none of the table rows.

C# - Get JavaScript variable value using HTMLAgilityPack

I currently have 2 JavaScript variables in which I need to retrieve values from. The HTML consists of a series of nested DIVs with no id/name attributes. Is it possible to retrieve the data from these variables using HTMLAgilityPack? If so how would I go about doing so, if not what would be required, regular expressions? If the latter, please help me in creating a regular expression that would allow me to do this. Thank you.
<div style="margin: 12px 0px;" align="left">
<script type="text/javascript">
variable1 = "var1";
variable2 = "var2";
</script>
</div>

I'm assuming you are trying to scrape this information from a website? Most likely one you don't have direct control over? There are several ways to do this, I'll go easy to hard( at least as I see em):
Ask the owner (of the site). Most of the time they can give you direct access to the information and if you ask nicely, they might just let you have it for free
You can use the webBrowser control, run the javascript and then parse values from the DOM afterwards. As opposed to HttpWebRequest, this allows for all the proper values to be loaded on the page and scraped. Helpful Link Here.
Steal the source with Firebug. Inspect the website with Firebug to see which URLs are called from the background. Most likely, its using an asynchronous request to retrieving the updated information from a webservice. Using Firebug, you can view this under the NET -> XHR. Look at the request and the values returned, you can then retrieve the values your self and parse the contents from the source rather than scrape the page.
I think this might be the information you were looking for, but if not let me know and I can clarify/fix answer

HTML string does not get verified

These spaces are not added by me on HTML SIDE and i cannot edit HTML
I want to know what should my comparison string?
I am using watin to automate website testing process but I am unable to encounter only one button.Every other works
watin searches content by name /values /id and many more and works fine but when i see the value of the submit button that i need to be clicked it has some breaks &nsbp so i think they are playing some role
Here is the html:
<span class='button'><input type="submit" value=" Login " /></span>
<span class='button'><input type="button" value=" Back " onclick="history.back(-1)" /></span>
and here is the code to search
browser.Button(WatiN.Core.Find.ByValue(" Login ")).Click();
what can be done??

-- Suggestion -- (i.e. too big for a comment)
You shouldn't use to add spaces to the submit button. Rather, you should use CSS to style the button to your liking. So you would have something like:
input[type=button] {
padding:10px;
min-width: 150px;
}
By the same token, this could eliminate any of the issues you're having with selecting the button. It could be an issue of encodings breaking with watin and as a result, doing this with CSS will make debugging the issue much cleaner and much easier.
Edit:
Have you tried searching by ID as opposed to by value? ID's are supposed to be unique on a page, so if it doesn't find it by those means, then that's one issue that can be rules out. It could also be the fact that you're searching for a button. A <button> is not the same as a <input type="button">.
Edit 2: Even though the issue was due to encodings breaking, I still recommend you reset that button to reset the text (removing all the non breaking spaces) and attach an id/name to it. The reason being for internationalization purposes - and if for some reason you modify the size of the button in the designer, or i18n the app and the text is different, your test will break.

You shouldn't use entities with WatiN.
This code will work, but you have to use real non-breaking space character:
browser.Button(
WatiN.Core.Find.ByValue(
"   Login   ")).Click();
This is probably inconvenient, but you could use (after adding reference to System.Web) HttpUtility class:
browser.Button(
WatiN.Core.Find.ByValue(
System.Web.HttpUtility.HtmlDecode(
" Login "))).Click();
But, if I were you, I would just go with Regex:
browser.Button(
WatiN.Core.Find.ByValue(
new Regex(#"^\s*Login\s*$"))).Click();
or even new Regex("Login").
Interesting thing: If you ever will have to Find.ByText you don't have to bother so much, and you can use regular space (ie. not exactly non-breaking space). That's because native IE IHTMLElement::getAttribute (http://msdn.microsoft.com/en-us/library/aa752280(VS.85).aspx) converts from innertext attribute to regular spaces, but from value, id etc. it doesn't ( are converted to real non-breaking spaces - 0xA0)

Wow, you really like spaces! I would remove those and use padding/margins like html was designed to be used. Then you wont need all those spaces and you can assign a proper value to your button which watiN will recognize.

I think it is because the in the HTML source is actually an escaped version of the special character that represents a none breaking space. So in you C# source, you'll probably need that character instead of the html entity code. I think you can find the code of that character by using this button to submit a GET form. It will show the escaped character code in the url.
Of course it is better not to put the spaces in there at all. You should give the button a padding using CSS instead.

Why is a meta refresh tag and title tag sitting outside of the <head> tag in ASP.NET?

When I render a page in ASP.NET, the following happens
</head>
<NOSCRIPT>
<meta http-equiv="REFRESH" content="0;URL=/Default.aspx?id=84&epslanguage=en-GB&jse=0" />
</NOSCRIPT>
<title>Page title goes here.</title>
<body>
My masterpage looks like this:
<title>Page title goes here.</title>
</head>
<body>
So what I'm asking is, where the heck has this refresh meta tag come from, why has it put it between my head tag and body tag, and why has my page title jumped outside of the head?!
When viewing the page's generated source in firebug, it shows the title tag and this new meta tag within the head tag, but viewing the source in any browser, it looks like the above. When using wget to scrape the page, it also comes out incorrectly as displayed above.
Any ideas why browsers may be interpreting this in different ways, and more importantly where this new meta tag has come from?
Thanks! Karl.
Edit:
Hi!
Thanks for your replies guys, very informative!
I've discovered that the problem is this line of code:
Page.Header.Controls.Add(ctrl);
Putting the mysterious meta tag in using this line puts it outside the head tag. When commenting this out, the title tag drops back into the right place, and all is well!
Any further thoughts?
Thanks!
Karl.

On the matter of why browsers will be interpreting it differently there are two answers. Firstly the firebug output as you say is generated source. That means its gone through a certain amount of processing already and clearly firefox is doing some magic to say "Well, its a meta and a title tag, they should be in the header so I'll put them there."
The other browsers you are comparing their raw source it sounds like which is before the browser has tried to make sense of it. I suspect you'd get the same if you viewed the raw source in firefox (ctrl-u).
I'd have expected all browsers to do much the same thing as you have described firefox as doing but if not then that's not really somethign to be concerned about. When invalid HTML like this is received the browsers have no real rules of what to do. This means that browsers are welcome to do whatever they want from trying to guess what you meant to just ignoring it entirely.
As for what is causing it, the epslanguage query paramter is from episerver - I don't know if that was in the request url or not so it may be that it is just being persisted or it may be episerver trying to redirect to a page with an explicit language instead of just assuming the default. Unfortunately I'm not familiar with episerver so I can't say any more specific to that.
It is of course definitely the case that there is something on your server side that is causing this to happen.
Do you get that for all pages out of interest or just one specific one or just in one specific circumstance?

Quite often it's a case of an element not being properly closed. Most browsers will try to adjust the markup so that it makes sense, but in most cases the markup will be incorrectly parsed.
You should probably share more of your master page (and the web form using it)!

Maybe your HEAD-tag doesn't have runat="server"?

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.