C# Watin Find.ByText with Regex - c#

I have the following problem here:
I'm trying to get a element from a webpage using Watin's Find.ByText. However, I fail to use regex in C#.
This statement will return the desired element.
return this.Document.Element(Find.ByText("781|262"));
When I try to use regex, I get back the whole page.
return this.Document.Element(Find.ByText(new Regex(#"781\|262")));
I am trying to get this element:
<td>781|262</td>
I also tried
return this.Document.Element(Find.ByText(Predicate));
private bool Predicate(string s)
{
return s.Equals("781|262");
}
The above works, while this does not:
private bool Predicate(string s)
{
return new Regex(#"781\|262").IsMatch(s);
}
I now realized, in the predicate s is the whole page content. I guess the issue is with Document.Element.
Any help appreciated, thank you.

Try with :
return this.Document.Element(Find.ByText(new Regex("781\\|262")));
or
return this.Document.Element(Find.ByText(new Regex("781|262")));
Choose the one that fits your needs, I don't know if the character "\" is significant for you.
You don't need the string to be a verbatim string in order to instantiate the regex class.

Well, I did not realize the Regex will also match the body/html element too, since the pattern is obviously also included in them. I had to specify that the text must begin and end with the pattern by using ^ and $, so it only matches the desired element:
^781\u007c262$
\u007c matches |, I used this since MSDN documentation also did.
The final code:
<td>781|262</td>
return Document.TableCell(Find.ByText(new Regex(#"^\d{3}\|\d{3}$")));
Document.TableCell to speedup the search by only trying Regex on td elements.
# is used to prevent C# from interpreting the \ as escape sequence.
^ is used to only match elements with text beginning with the following pattern
\d{3} match didit 0-9 3 times
\| match | literally
\d{3} match digit 0-9 3 times
$ the element must also end with this pattern

Related

length in regular expression [duplicate]

I need a regex that will only find matches where the entire string matches my query.
For instance if I do a search for movies with the name "Red October" I only want to match on that exact title (case insensitive) but not match titles like "The Hunt For Red October". Not quite sure I know how to do this. Anyone know?
Thanks!
Try the following regular expression:
^Red October$
By default, regular expressions are case sensitive. The ^ marks the start of the matching text and $ the end.
Generally, and with default settings, ^ and $ anchors are a good way of ensuring that a regex matches an entire string.
A few caveats, though:
If you have alternation in your regex, be sure to enclose your regex in a non-capturing group before surrounding it with ^ and $:
^foo|bar$
is of course different from
^(?:foo|bar)$
Also, ^ and $ can take on a different meaning (start/end of line instead of start/end of string) if certain options are set. In text editors that support regular expressions, this is usually the default behaviour. In some languages, especially Ruby, this behaviour cannot even be switched off.
Therefore there is another set of anchors that are guaranteed to only match at the start/end of the entire string:
\A matches at the start of the string.
\Z matches at the end of the string or before a final line break.
\z matches at the very end of the string.
But not all languages support these anchors, most notably JavaScript.
I know that this may be a little late to answer this, but maybe it will come handy for someone else.
Simplest way:
var someString = "...";
var someRegex = "...";
var match = Regex.Match(someString , someRegex );
if(match.Success && match.Value.Length == someString.Length){
//pass
} else {
//fail
}
Use the ^ and $ modifiers to denote where the regex pattern sits relative to the start and end of the string:
Regex.Match("Red October", "^Red October$"); // pass
Regex.Match("The Hunt for Red October", "^Red October$"); // fail
You need to enclose your regex in ^ (start of string) and $ (end of string):
^Red October$
If the string may contain regex metasymbols (. { } ( ) $ etc), I propose to use
^\QYourString\E$
\Q starts quoting all the characters until \E.
Otherwise the regex can be unappropriate or even invalid.
If the language uses regex as string parameter (as I see in the example), double slash should be used:
^\\QYourString\\E$
Hope this tip helps somebody.
Sorry, but that's a little unclear.
From what i read, you want to do simple string compare. You don't need regex for that.
string myTest = "Red October";
bool isMatch = (myTest.ToLower() == "Red October".ToLower());
Console.WriteLine(isMatch);
isMatch = (myTest.ToLower() == "The Hunt for Red October".ToLower());
You can do it like this Exemple if i only want to catch one time the letter minus a in a string and it can be check with myRegex.IsMatch()
^[^e][e]{1}[^e]$

How can i make my regular expression work?

I am new to both .NET (C#) and regular expressions.
I need a regular expression to match against a url:
If url string contains "/id/Whatever_COMES_HERE_EVERY_CHAR_ACCEPTED/" : return true
If url string contains only "/id/" : return false
I have tried the following but it only returns true if url is http:// localhost/id/
This is my script:
string thisUrl = HttpContext.Current.Request.Url.AbsolutePath;
Match match = Regex.Match(thisUrl, #"/id/*$");
What am i doing wrong?
You have this:
/id/*$
What this is doing is matching the literal string /id/ and then you have the quantifier * which means 0 or more times. Then you have $ which means end of the string.
You are looking for repetition of the literal / which is not what you want. (So this: http:// localhost/id/////////////////// should have matched too with your original regex)
What you need is something like this:
/id/.+$
This will match the literal /id/ followed by the . which in regex means any character which is quantified with the + which means 1 or more.
You could tighten it up and use \S instead of . which means non-whitespace characters (since a URL shouldn't have whitespace)
Also note: there are a variety of online regex tools which are really useful when trying to figure out and test a regex. A couple of examples:
http://rubular.com/
http://regex101.com/
http://www.regxlib.com/
And even extension for visual studio you can use:
https://visualstudiogallery.msdn.microsoft.com/bf883ae3-188b-43bc-bd29-6235c4195d1f
When you use the start it signals that 0 or more of the preceding char shall be present. You will want to use
"/id/.+" to signal that at least one more char must come after the /
If you're just looking for true/false solution, you should use IsMatch() function. The other issue is that * (zero or more) and + (one or more) are quantifiers and must be preceeded by a character class or group. Dot (.) is a character class that represents ANY character. So the correct solution for your problem would be:
Regex.IsMatch(thisUrl, #"/id/.+$");
Considering that the input is a URL, this regex can be improved upon by restricting character classes to valid URL characters only, but for your purpose the above should be sufficient.

How do I match an entire string with a regex?

I need a regex that will only find matches where the entire string matches my query.
For instance if I do a search for movies with the name "Red October" I only want to match on that exact title (case insensitive) but not match titles like "The Hunt For Red October". Not quite sure I know how to do this. Anyone know?
Thanks!
Try the following regular expression:
^Red October$
By default, regular expressions are case sensitive. The ^ marks the start of the matching text and $ the end.
Generally, and with default settings, ^ and $ anchors are a good way of ensuring that a regex matches an entire string.
A few caveats, though:
If you have alternation in your regex, be sure to enclose your regex in a non-capturing group before surrounding it with ^ and $:
^foo|bar$
is of course different from
^(?:foo|bar)$
Also, ^ and $ can take on a different meaning (start/end of line instead of start/end of string) if certain options are set. In text editors that support regular expressions, this is usually the default behaviour. In some languages, especially Ruby, this behaviour cannot even be switched off.
Therefore there is another set of anchors that are guaranteed to only match at the start/end of the entire string:
\A matches at the start of the string.
\Z matches at the end of the string or before a final line break.
\z matches at the very end of the string.
But not all languages support these anchors, most notably JavaScript.
I know that this may be a little late to answer this, but maybe it will come handy for someone else.
Simplest way:
var someString = "...";
var someRegex = "...";
var match = Regex.Match(someString , someRegex );
if(match.Success && match.Value.Length == someString.Length){
//pass
} else {
//fail
}
Use the ^ and $ modifiers to denote where the regex pattern sits relative to the start and end of the string:
Regex.Match("Red October", "^Red October$"); // pass
Regex.Match("The Hunt for Red October", "^Red October$"); // fail
You need to enclose your regex in ^ (start of string) and $ (end of string):
^Red October$
If the string may contain regex metasymbols (. { } ( ) $ etc), I propose to use
^\QYourString\E$
\Q starts quoting all the characters until \E.
Otherwise the regex can be unappropriate or even invalid.
If the language uses regex as string parameter (as I see in the example), double slash should be used:
^\\QYourString\\E$
Hope this tip helps somebody.
Sorry, but that's a little unclear.
From what i read, you want to do simple string compare. You don't need regex for that.
string myTest = "Red October";
bool isMatch = (myTest.ToLower() == "Red October".ToLower());
Console.WriteLine(isMatch);
isMatch = (myTest.ToLower() == "The Hunt for Red October".ToLower());
You can do it like this Exemple if i only want to catch one time the letter minus a in a string and it can be check with myRegex.IsMatch()
^[^e][e]{1}[^e]$

What is wrong with my regex (simple)?

I am trying to make a regex that matches all occurrences of words that are at the start of a line and begin with #.
For example in:
#region #like
#hey
It would match #region and #hey.
This is what I have right now:
^#\w*
I apologize for posting this question. I'm sure it has a very simple answer, but I have been unable to find it. I admit that I am a regex noob.
What you've got should work, depending on what flags you pass for RegexOptions. You need to make sure you pass RegexOptions.Multiline:
var matches = Regex.Matches(input, #"^#\w*", RegexOptions.Multiline);
See the documentation I linked to above:
Multiline Multiline mode. Changes the meaning of ^ and $ so they match at the beginning and end, respectively, of any line, and not just the beginning and end of the entire string.
The regex looks fine, make sure you're using a verbatim string literal (# prefix) to define your regex, i.e. #"^#\w*" otherwise the backslash will be treated as an escape sequence.
Use this regex
^#.+?\b
.+ will ensure at least one character after # and \b indicates word boundry. ? adds non-greediness to the + operator so as to avoid matching whole string #region #like

Matching an (easy??) regular expression using C#'s regex

Ok sorry this might seem like a dumb question but I cannot figure this thing out :
I am trying to parse a string and simply want to check whether it only contains the following characters : '0123456789dD+ '
I have tried many things but just can't get to figure out the right regex to use!
Regex oReg = new Regex(#"[\d dD+]+");
oReg.IsMatch("e4");
will return true even though e is not allowed...
I've tried many strings, including Regex("[1234567890 dD+]+")...
It always works on Regex Pal but not in C#...
Please advise and again i apologize this seems like a very silly question
Try this:
#"^[0-9dD+ ]+$"
The ^ and $ at the beginning and end signify the beginning and end of the input string respectively. Thus between the beginning and then end only the stated characters are allowed. In your example, the regex matches if the string contains one of the characters even if it contains other characters as well.
#comments: Thanks, I fixed the missing + and space.
Oops, you forgot the boundaries, try:
Regex oReg = new Regex(#"^[0-9dD +]+$");
oReg.IsMatch("e4");
^ matches the begining of the text stream, $ matches the end.
It is matching the 4; you need ^ and $ to terminate the regex if you want a full match for the entire string - i.e.
Regex re = new Regex(#"^[\d dD+]+$");
Console.WriteLine(re.IsMatch("e4"));
Console.WriteLine(re.IsMatch("4"));
This is because regular expressions can also match parts of the input, in this case it just matches the "4" of "e4". If you want to match a whole line, you have to surround the regex with "^" (matches line start) and "$" (matches line end).
So to make your example work, you have to write is as follows:
Regex oReg = new Regex(#"^[\d dD+]+$");
oReg.IsMatch("e4");
I believe it's returning True because it's finding the 4. Nothing in the regex excludes the letter e from the results.
Another option is to invert everything, so it matches on characters you don't want to allow:
Regex oReg = new Regex(#"[^0-9dD+]");
!oReg.IsMatch("e4");

Categories