I have probably spent a good 8 hours trying to figure this out but am constantly failing. I have searched an age for a solution
I am trying to find an selenium element by partial id match using xpath (c# selenium libraries). The following works perfectly fine. The partial text is sel_1-rowse1
IWebElement elem = wait5.Until(x => x.FindElement(By.XPath("//a[contains(#id,'sel_1-rowsel')]")));
However when I want to use a variable named partial this does not work
string partial = "sel_1-rowse1";
IWebElement search = wait.Until(x => x.FindElement(By.XPath(String.Format("//a[contains(#id,'{0}')]", partial))));
or
IWebElement search = wait.Until(x => x.FindElement(By.XPath(String.Format("//a[contains(#id,{0})]", partial))));
I have tried single quotes double quotes and escape chars. But cant figure this out. I cant even provide the error as its picking up a valid id. Brain is severely depleted on this one.
Just an observation, the first example element id ends with lower case 'L' (so l) while the second one with number 1. Might be just a copy paste error but worth asking...
partial is a reserved keyword in C#.
Refactor partial to something else (not reserved by C#) and you should be golden.
Related
I'm working on a movie scraper / auto-downloader that iterates over my current movie collection, finds new recommendations, and downloads the new goods.
There is a part where I scrape IMDb for metadata and it seems to get stuck in this one spot and I can't seem to figure out why.... it has run this same code with different imdb pages just fine (this is the 29th iteration of a new page)
I am using c#!
The code:
private string Match(string regex, string html, int i = 1)
{
return new Regex(regex, RegexOptions.Multiline).Match(html).Groups[i].Value.Trim();
}
regex parameter string contents:
<title>.*?\\(.*?(\\d{4}).*?\\).*?</title>
html parameter string contents: too big to paste here, but literally the html string representation of http://www.imdb.com/title/tt4422748/combined
if in chrome, you can view easily with:
view-source:http://www.imdb.com/title/tt4422748/combined
I have paused execution in visual studio and stepped forward, it continues to run but just hangs (it doesn't let me step, it just runs). If i hit pause again it will return to the same spot with the same parameter values (and no I am not calling it in an infinite loop. I'm pretty new to Regex so any help would be appreciated!
Use of .* is like saying I want to match everything, yet nothing. Each use of it causes the parser to backtrack on so many different possibilities it becomes unresponsive and appears to lock up.
Does the person designing the pattern really not know if there is going to be text there or not for title? I bet 99% of the time the title has text..so why is .* even used, how about .+ at least?
If you want text between the delimiters, use this
title\>(?<Title>[^<]+)\</title
Then extract the matched text through the named group "Title" instead of group[0]. Group[1] will have the actual match text as well if one loathes named match captures.
Answer for Regex Haters
Use the HTML agility pack.
I have been having trouble finding a solution to this problem.
I am parsing the content of a number of ebooks, finding specific terms and characters, marking the locations and lengths of each term.
A normal case would be something like this (excerpts from A Game of Thrones):
"When he paused to look down, his head swam dizzily and he felt his fingers slipping. Bran cried out and clung for dear life."
If we are searching for the character "Bran", its location is 85 and length is 4. Easy enough.
My issue arises when there is a paragraph like this:
<span height="-0em"><font size="7">D</font></span>aenerys Targaryen wed Khal Drogo
We need to match "Daenerys Targaryn". It is easy enough to strip the HTML and match the string, but in this example the result needs to include the HTML. Thus the expected result would here be would be location = 0, length = 67.
Another situation, caused by random anchor tags scattered throughout:
Did anyone outside the Vale even suspect where Catelyn <a></a>Stark had taken him?
Again, searching for "Catelyn Stark" needs to include the HTML, so location = 47, length = 20.
I have been able to get around it temporarily by adding those specific cases (searching for "Catelyn <a></a>Stark specifically), but clearly I should have a more robust solution, which I cannot seem to get my head around. My attempts have been using RegEx but with limited success.
I have found various questions regarding HTML matching/stripping (and whether or not to use RegEx =)), but this case seems to be somewhat unique.
Stripping the tags isn't an option as the content must be preserved.
This is within a stand-alone C# application.
Any ideas, steps in the right direction, or similar examples should your search go better than mine would be greatly appreciated!
One possible approach would be to insert the following between each letter in your search string:
(?:<[^>]*>)*
So when searching for the character "Bran" your regex would become the following:
(?:<[^>]*>)*B(?:<[^>]*>)*r(?:<[^>]*>)*a(?:<[^>]*>)*n
This will allow your regex to match any number of HTML tags anywhere within the search string. Note that this will only work if your search strings are always something simple like a character's name, and not regular expressions (this method will fail if there is repetition like a* in your search string).
I would create a function that would take "Daenerys Targaryn" as a parameter and then strip the first letter. Then, it would only search for "aenerys Targaryn," and if found, it would search for ">D<" or the first variable letter. Does than make sense?
Example:
public static string searchFor(string str)
{
// strip first letter of search string (in this case "D")
// search for the rest of the string ("aenerys Targaryn")
// if found, search for ">D<"
// if found, search for HTML tags with "D" inside (using regex)
// if found, search for HTML tags with the previous HTML tag in them (using regex)
return result;
}
Well using Javascript or Php you can get the text of elements and the text of documents and search there and then do a regex to return the closest match (containing the html):
Another option:
would be to index the books first using something like Lucene Search Engine (which happens to let you index in different formats (html format being one of them).
You can then use the Lucene api to search your documents a little easier.
In php we have Zend_Search_Lucene which works perfectly for this kind of thing.
Lucene Search can be found at:
http://lucene.apache.org/core/
Have fun!
I am trying to locate an element with a dynamically generated id. The last part of the string is constant ("ReportViewer_fixedTable"), so I can use that to locate the element. I have tried to use regex in XPath:
targetElement = driver.FindElement(
By.XPath("//table[regx:match(#id, "ReportViewer_fixedTable")]"));
And locating by CssSelector:
targetElement = driver.FindElement(
By.CssSelector("table[id$='ReportViewer_fixedTable']"));
Neither works. Any suggestions would be appreciated.
That is because the css selector needs to be modified you were almost there...
driver.FindElement(By.CssSelector("table[id*='ReportViewer_fixedTable']"))`
From https://saucelabs.com/blog/selenium-tips-css-selectors-in-selenium-demystified:
css=a[id^='id_prefix_']
A link with an id that starts with the text id_prefix_.
css=a[id$='_id_sufix']
A link with an id that ends with the text _id_sufix.
css=a[id*='id_pattern']
A link with an id that contains the text id_pattern.
You were using a suffix which I'm assuming was not the partial link text identifier you were supposed to be using (unless I saw your html, which means try showing your html next time). *= is reliable in any situation though.
try using
targetElement = driver.FindElement(By.XPath("//table[contains(#id, "ReportViewer_fixedTable")]"));
Note this will check for all the elements that have id which contains (and not only ends with 'ReportViewer_fixedTable'). I will try to find a regex option that would be more accurate answer to you question.
This solution will work irrespective of the XPath version. First, create a method somewhere in your COMMON helper class.
public static string GetXpathStringForIdEndsWith(string endStringOfControlId)
{
return "//*[substring(#id, string-length(#id)- string-length(\"" + endStringOfControlId + "\") + 1 )=\"" + endStringOfControlId + "\"]";
}
In my case, below is the control ID in different version of my product ::
v1.0 :: ContentPlaceHolderDefault_MasterPlaceholder_HomeLoggedOut_7_hylHomeLoginCreateUser
v2.0 :: ContentPlaceHolderDefault_MasterPlaceholder_HomeLoggedOut_8_hylHomeLoginCreateUser
Then, you can call the above method to find the control which has static end string.
By.XPath(Common.GetXpathStringForIdEndsWith("<End String of the Control Id>"))
For the control ID's which I mentioned for v1 & v2, I use like below :
By.XPath(Common.GetXpathStringForIdEndsWith("hylHomeLoginCreateUser"))
The overall logic is that, you can use the below XPath expression to find a control which ends with particular string:
//*[substring(#id, string-length(#id)- string-length("<EndString>") + 1 )="<EndString>"]
I am using Visual Studio 2010 to write Selenium 2 Webdriver automated tests in C#.
I have searched high and low for examples of using variables as selectors and have found nothing that seems to work. The one example I have found of a variable used as a selector had the variable with $ prefix and enclosed in {}.
An example of what I am trying to do is below:
string surveyName = "Selenium test survey";
Driver.FindElement(By.CssSelector("tr[svd='${surveyName}']"))
I get the error:
OpenQA.Selenium.WebDriverException : Unexpected error. Unable to find element using css: tr[svd='${surveyName}']
If I 'hard code' the selector like this:
Driver.FindElement(By.CssSelector("tr[svd='Selenium test survey']"))
it finds the element.
svd is an attribute of the tr element. I am trying to select a row within a table by the value of this attribute. The text will be different for each test and so must be a variable.
I have tried expressing the variable a number of different ways but had no luck making this work. Any help would be much appreciated.
Thanks.
string surveyName = "Selenium test survey";
Driver.FindElement(By.CssSelector(String.Format("tr[svd='{0}']", surveyName))
will do what you want. This is c# so when it takes a string you can do all kinds of things to get that string
I am trying to find all of the links in source code on a website, could anyone tell me the expression i would need to put in my Regex to find these?
Duplicate of (among others): Regular expression for parsing links from a webpage?
Google finds more: html links regex site:stackoverflow.com
I'm not certain how these would translate to C# (I haven't done any development in C# myself yet), but here's how I might do it in JavaScript or ColdFusion. It might give you an idea about how you want to do it in C#.
In JavaScript I think this would work:
rex = /.*href="([^"]+)"/;
a = source.replace(rex,'\n$1').split('\n');
after which a would be an array containing the links... though I'm not certain if that will work exactly the way I think it will. The idea here is that the replace creates a line-break-delimited list (because you can't have a line-break in a URL) and then you can break apart the list with split() to get your array.
By comparison in ColdFusion you would have to do something slightly different:
a = REMatch('href="[^"]+"',source);
for (i = 1; i < ArrayLen(a); i++) {
a[i] = mid(a[i],6,len(a[i])-1);
}
Again, I haven't tested it, but rematch returns an array of instances of the expression and then the for-next loop removes the href="" around the actual URL.