Make selenium code with a lot of XPath-s more readable

Make selenium code with a lot of XPath-s more readable - c#

I've noticed that when I'm using selenium web driver to interact with elements on a web page my code becomes not readable because I use a lot of XPath-s to find this elements. For example:
driver.FindElement(By.XPath("//div[#class='login']//a[#href='#']"), Globals.TIMEOUT).Click();
var loginField = driver.FindElement(By.XPath("//div[#id='login_box']//input[#name='login_name']"));
jdriver.ExecuteScript("arguments[0].setAttribute('value', '" + login + "')", loginField);
var passwordField = driver.FindElement(By.XPath("//div[#id='login_box']//input[#name='login_password']"));
jdriver.ExecuteScript("arguments[0].setAttribute('value', '" + password + "')", passwordField);
driver.FindElement(By.XPath("//div[#id='login_box']//input[#type='submit']")).Click();
driver.FindElement(By.XPath("//div[#class='nameuser']"), Globals.TIMEOUT);
I thought that I can place XPath values into constant strings but it's very helpful to see the actual XPath of the element while reading the code. But on the other hand, when the XPath of some object changes I have to change it at all places it is used.
So what is the best solution for this problem?

Use Selenium Page Object Model using Page factory. Helps to maintain clean code and enhances readability of code.

Create a page object file.
For example, if you are using an xPath like "//div[#id='login_box']//input[#type='submit']" a lot, in the page object file put:
var loginSubmit = "//div[#id='login_box']//input[#type='submit']"
Then in your main file import the page object file:
using myPageObjectFile
driver.FindElement(By.XPath(myPageObjectFile.loginSubmit));
My C# is not great so it might not be like this exactly. But something to that effect should work.
This way when the xPath changes, you only need to adjust it in the page object file.

Related

Detect if html string contains javascript [duplicate]

Is there a library or acceptable method for sanitizing the input to an html page?
In this case I have a form with just a name, phone number, and email address.
Code must be C#.
For example:
"<script src='bobs.js'>John Doe</script>" should become "John Doe"

We are using the HtmlSanitizer .Net library, which:
Is open-source (MIT) - GitHub link
Is fully customizable, e.g. configure which elements should be removed. see wiki
Is actively maintained
Doesn't have the problems like Microsoft Anti-XSS library
Is unit tested with the
OWASP XSS Filter Evasion Cheat Sheet
Is special built for this (in contrast to HTML Agility Pack, which is a parser - not a sanitizer)
Doesn't use regular expressions (HTML isn't a regular language!)
Also on NuGet

Based on the comment you made to this answer, you might find some useful info in this question:
https://stackoverflow.com/questions/72394/what-should-a-developer-know-before-building-a-public-web-site
Here's a parameterized query example. Instead of this:
string sql = "UPDATE UserRecord SET FirstName='" + txtFirstName.Text + "' WHERE UserID=" + UserID;
Do this:
SqlCommand cmd = new SqlCommand("UPDATE UserRecord SET FirstName= #FirstName WHERE UserID= #UserID");
cmd.Parameters.Add("#FirstName", SqlDbType.VarChar, 50).Value = txtFirstName.Text;
cmd.Parameters.Add("#UserID", SqlDbType.Integer).Value = UserID;
Edit: Since there was no injection, I removed the portion of the answer dealing with that. I left the basic parameterized query example, since that may still be useful to anyone else reading the question.
--Joel

It sounds like you have users that submit content but you cannot fully trust them, and yet you still want to render the content they provide as super safe HTML. Here are three techniques: HTML encode everything, HTML encode and/or remove just the evil parts, or use a DSL that compiles to HTML you are comfortable with.
Should it become "John Doe"? I would HTML encode that string and let the user, "John Doe" (if indeed that is his real name...), have the stupid looking name <script src='bobs.js'>John Doe</script>. He shouldn't have wrapped his name in script tags or any tags in the first place. This is the approach I use in all cases unless there is a really good business case for one of the other techniques.
Accept HTML from the user and then sanitize it (on output) using a whitelist approach like the sanitization method #Bryant mentioned. Getting this right is (extremely) hard, and I defer pulling that off to greater minds. Note that some sanitizers will HTML encode evil where others would have removed the offending bits completely.
Another approach is to use a DSL that "compiles" to HTML. Make sure to whitehat your DSL compiler because some (like MarkdownSharp) will allow arbitrary HTML like <script> tags and evil attributes through unencoded (which by the way is perfectly reasonable but may not be what you need or expect). If that is the case you will need to use technique #2 and sanitize what your compiler outputs.
Closing thoughts:
If there is not a strong business case for technique #2 or #3 then reduce risk and save yourself effort and the use of the worries, go with technique #1.
Don't assume your safe because you used a DSL. For example: the original implementation of Markdown allows HTML through, unencoded. "For any markup that is not covered by Markdown’s syntax, you simply use HTML itself. There’s no need to preface it or delimit it to indicate that you’re switching from Markdown to HTML; you just use the tags."
Encode when you output. You can also encode input but doing so can put you in a bind. If you encoded incorrectly and saved that, how will you get the original input back so that you can re-encode after fixing faulty encoder?

If by sanitize you mean REMOVE the tags entirely, the RegEx example referenced by Bryant is the type of solution you want.
If you just want to ensure that the code DOESN'T mess with your design and render to the user. You can use the HttpUtility.HtmlEncode method to prevent against that!

What about using Microsoft Anti-Cross Site Scripting Library?

Reading values from webpage programmatically

I don't know what it called, but i think this is possible
I am looking to write something(don't know the exact name) that will,
go to a webpage and select a value from drop-down box on that page and read values from that page after selection, I am not sure weather it called crawler or activity, i am new to this but i heard long time back from one of my friend this can be done,
can any one please give me a head start
Thanks

You need an HTTP client library (perhaps libcurl in C, or some C# wrapper for it, or some native C# HTTP client library like this).
You also need to parse the retrieved HTML content. So you probably need an HTML parsing library (maybe HTML agility pack).
If the targeted webpage is nearly fixed and has e.g. some comments to ease finding the relevant part, you might use simpler or ad-hoc parsing techniques.
Some sites might send a nearly empty static HTML client, with the actual page being dynamically constructed by Javascript scripts (Ajax). In that case, you are unlucky.
Maybe you want some web service ....

One simple way (but not the most efficient way) is to simply read the webpage as String using the WebClient, for example:
WebClient Web = new WebClient();
String Data = Web.DownloadString("Address");
Now since HTML is simply an XML document you can parse the string to a XDocument and look up the tag that represents the dropdown box. Parsing the string to XDocument is done this way:
XDocument xdoc = XDocument.Pase(Data);
Update:
If you want to read the result of the selected value, and that result is displayed within the page do this:
Get all the items as I explained.
If the page does not make use of models, then you can use your selected value as an argument for example :
www.somepage.com/Name=YourItem?
Read the page again and find the value

Finding an element by partial id with Selenium in C#

I am trying to locate an element with a dynamically generated id. The last part of the string is constant ("ReportViewer_fixedTable"), so I can use that to locate the element. I have tried to use regex in XPath:
targetElement = driver.FindElement(
By.XPath("//table[regx:match(#id, "ReportViewer_fixedTable")]"));
And locating by CssSelector:
targetElement = driver.FindElement(
By.CssSelector("table[id$='ReportViewer_fixedTable']"));
Neither works. Any suggestions would be appreciated.

That is because the css selector needs to be modified you were almost there...
driver.FindElement(By.CssSelector("table[id*='ReportViewer_fixedTable']"))`
From https://saucelabs.com/blog/selenium-tips-css-selectors-in-selenium-demystified:
css=a[id^='id_prefix_']
A link with an id that starts with the text id_prefix_.
css=a[id$='_id_sufix']
A link with an id that ends with the text _id_sufix.
css=a[id*='id_pattern']
A link with an id that contains the text id_pattern.
You were using a suffix which I'm assuming was not the partial link text identifier you were supposed to be using (unless I saw your html, which means try showing your html next time). *= is reliable in any situation though.

try using
targetElement = driver.FindElement(By.XPath("//table[contains(#id, "ReportViewer_fixedTable")]"));
Note this will check for all the elements that have id which contains (and not only ends with 'ReportViewer_fixedTable'). I will try to find a regex option that would be more accurate answer to you question.

This solution will work irrespective of the XPath version. First, create a method somewhere in your COMMON helper class.
public static string GetXpathStringForIdEndsWith(string endStringOfControlId)
{
return "//*[substring(#id, string-length(#id)- string-length(\"" + endStringOfControlId + "\") + 1 )=\"" + endStringOfControlId + "\"]";
}
In my case, below is the control ID in different version of my product ::
v1.0 :: ContentPlaceHolderDefault_MasterPlaceholder_HomeLoggedOut_7_hylHomeLoginCreateUser
v2.0 :: ContentPlaceHolderDefault_MasterPlaceholder_HomeLoggedOut_8_hylHomeLoginCreateUser
Then, you can call the above method to find the control which has static end string.
By.XPath(Common.GetXpathStringForIdEndsWith("<End String of the Control Id>"))
For the control ID's which I mentioned for v1 & v2, I use like below :
By.XPath(Common.GetXpathStringForIdEndsWith("hylHomeLoginCreateUser"))
The overall logic is that, you can use the below XPath expression to find a control which ends with particular string:
//*[substring(#id, string-length(#id)- string-length("<EndString>") + 1 )="<EndString>"]

Sanitize User Input in C# - Cleanest Way?

There's a gotcha when inserting img's dynamically via scripts.
Take the following code to insert a image for a place:
newPlace.find('.PlaceThumb').append('<img src="' + place.ThumbnailUrl + '" alt="' + place.Name + '" width="50px" style = "padding:2px;"/>');
Someone could name their place: " onload="alert('hi')" and the tag would get rendered as:
<img src="/item.aspx?id=123" alt="" onload="alert('hi')" width="50px" style = "padding:2px;"/>
When the image is loaded, the script will execute.
While only and tags support the onload attribute, this is a good lesson to never trust user input.
What is the "Correct" (nice, elegant, clean, general) way of doing this:
newPlace.find('.PlaceThumb').append('<img src="' + place.ThumbnailUrl + '" alt="' + place.Name.replace('"', '"') + '" width="50px" style = "padding:2px;"/>');
I was thinking maybe with templates you could define an operator on strings that would UUencode them - similar to how a string prefixed with # in C# has special meaning vis a vis backslashes. Is there a way to add this functionality to the standard .net string class?

Maybe you can use new Uri( yourUrlString ). I believe using that (alog with the methods IsWellFormedUriString and IsWellFormedOriginalString) will help you validate the input.

AntiXSS library is one possible solution. Be very carefull with encoding as your code seem to have large number of layers between data and rendered HTML (ASP.Net -> renders HTML template as part of rendered JavaScript -> Browser loads JavaScript -> something executes the scriot that in turn uses JQuery to create HTML based on the temeplate).
Note: Consider separating CSS (width and style attributes) from HTML layout as good HTML practice.

I can't tell if you're using jQuery or not. If you are, then you can do something like this:
newPlace.find('.PlaceThumb').append('<img>');
$('.PlaceThumb img').attr('src', place.ThumbnailUrl).attr('alt', place.Name);
That may not be valid, it's just off the top of my head, but should give you something to look into.

How to use C# to sanitize input on an html page?

Is there a library or acceptable method for sanitizing the input to an html page?
In this case I have a form with just a name, phone number, and email address.
Code must be C#.
For example:
"<script src='bobs.js'>John Doe</script>" should become "John Doe"

We are using the HtmlSanitizer .Net library, which:
Is open-source (MIT) - GitHub link
Is fully customizable, e.g. configure which elements should be removed. see wiki
Is actively maintained
Doesn't have the problems like Microsoft Anti-XSS library
Is unit tested with the
OWASP XSS Filter Evasion Cheat Sheet
Is special built for this (in contrast to HTML Agility Pack, which is a parser - not a sanitizer)
Doesn't use regular expressions (HTML isn't a regular language!)
Also on NuGet

Based on the comment you made to this answer, you might find some useful info in this question:
https://stackoverflow.com/questions/72394/what-should-a-developer-know-before-building-a-public-web-site
Here's a parameterized query example. Instead of this:
string sql = "UPDATE UserRecord SET FirstName='" + txtFirstName.Text + "' WHERE UserID=" + UserID;
Do this:
SqlCommand cmd = new SqlCommand("UPDATE UserRecord SET FirstName= #FirstName WHERE UserID= #UserID");
cmd.Parameters.Add("#FirstName", SqlDbType.VarChar, 50).Value = txtFirstName.Text;
cmd.Parameters.Add("#UserID", SqlDbType.Integer).Value = UserID;
Edit: Since there was no injection, I removed the portion of the answer dealing with that. I left the basic parameterized query example, since that may still be useful to anyone else reading the question.
--Joel

If by sanitize you mean REMOVE the tags entirely, the RegEx example referenced by Bryant is the type of solution you want.
If you just want to ensure that the code DOESN'T mess with your design and render to the user. You can use the HttpUtility.HtmlEncode method to prevent against that!

What about using Microsoft Anti-Cross Site Scripting Library?

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.