.NET Regex: Is {0} quantifier works? - c#

In LINQPad (.NET ) all these expressions returns "True":
new Regex(#"\w{0}").IsMatch("aa aa ZZ Z").Dump();
new Regex(#"(\w){0}").IsMatch("aa aa ZZ Z").Dump();
new Regex(#"[\w]{0}").IsMatch("aa aa ZZ Z").Dump();
new Regex(#"([\w]){0}").IsMatch("aa aa ZZ Z").Dump();
new Regex(#"\w{0,0}").IsMatch("aa aa ZZ Z").Dump();
new Regex(#"(\w){0,0}").IsMatch("aa aa ZZ Z").Dump();
new Regex(#"[\w]{0,0}").IsMatch("aa aa ZZ Z").Dump();
new Regex(#"([\w]){0,0}").IsMatch("aa aa ZZ Z").Dump();
new Regex(#"([a]){0,0}").IsMatch("aaaaZZZ").Dump();
Why?

I'm assuming that your plan is to make sure that a certain character isn't present in the source string by using the {0} quantifier on it. That's not going to work like this. The {0} quantifier itself is useless here - it means "match the previous token zero times". This is true for all strings, even the empty string. Zero is only useful as a lower bound, for example in a{0,5} to match zero to five as.
Regexes are designed to match text, so you need to go through some contortions to make them not match text. For example:
Regex(#"^\W*$") // syntactic sugar for Regex(#"^[^\w]*$")
matches only if the entire string consists of non-alphanumeric characters.
Regex(#"^[^a]*$")
matches only if the entire string consists of characters other than a.

Regex is better at positive assertions than negative. new Regex(#"\w{0}") is the same as new Regex(#""). {0} means to match zero instances of \w. Since there is nothing else in the regex, it will match all input strings.

You are trying on each expressions to match a zero-width string that is present in all strings of the world. Thus it returns true.

Related

Search for 2 specific letters followed by 4 numbers Regex

I need to check if a string begins with 2 specific letters and then is followed by any 4 numbers.
the 2 letters are "BR" so BR1234 would be valid so would BR7412 for example.
what bit of code do I need to check that the string is a match with the Regex in C#?
the regex I have written is below, there is probably a more efficient way of writing this (I'm new to RegEx)
[B][R][0-9][0-9][0-9][0-9]
You can use this:
Regex regex = new Regex(#"^BR\d{4}");
^ defines the start of the string (so there should be no other characters before BR)
BR matches - well - BR
\d is a digit (0-9)
{4} says there must be exactly 4 of the previously mentioned group (\d)
You did not specify what is allowed to follow the four digits. If this should be the end of the string, add a $.
Usage in C#:
string matching = "BR1234";
string notMatching = "someOther";
Regex regex = new Regex(#"^BR\d{4}");
bool doesMatch = regex.IsMatch(matching); // true
doesMatch = regex.IsMatch(notMatching); // false;
BR\d{4}
Some text to make answer at least 30 characters long :)

Regex issue parsing Season / Episode pattern

Why doesn't this regex pattern parse the string "Season 02 Episode 01" properly?
For example, this is not a match:
var fileName = "Its Always Sunny in Philadelphia Season 02 Episode 01 - Charlie Gets Crippled.avi"
// Regex explanation:
// Starts with "S" and can contain more letters, can continue with space, then contains two numbers.
// Then starts with "E" again and can contain more letters, can continue with space, then contains two numbers.
var pattern = #"S\w?\s?(\d\d)\s?E\w?\s?(\d\d)";
var regex = new Regex(pattern, RegexOptions.IgnoreCase);
var match = regex.Match(fileName);
Use * instead of ?
? is for 0 or 1 time. * is for 0 or more times.
Starts with "S" and can contain more letters [...]
You mean +, not ?.
var pattern = #"S\w+\s+(\d+)\s+E\w+\s+(\d+)";
Note that this regex is pretty unspecific. Watch out for false positives. I'd recommend to make the expression more specific.

Regex to match all Romanian phone numbers

I searched the whole google to find some ways to verify if the phone number is Romanian but didn't found anything that helps me...
I want a Regex validator for the following numbers format:
074xxxxxxx
075xxxxxxx
076xxxxxxx
078xxxxxxx
072xxxxxxx
077xxxxxxx
0251xxxxxx
0351xxxxxx
This is the regex that I've made, but it is not working:
{ "Romania", new Regex("(/^(?:(?:(?:00\\s?|\\+)40\\s?|0)(?:7\\d{2}\\s?\\d{3}\\s?\\d{3}|(21|31)\\d{1}\\s?\\d{3}\\s?\\d{3}|((2|3)[3-7]\\d{1})\\s?\\d$)")}
It doesn't validate the correct numbers format.
More details:
If the number begins with other than the initial ones that I've added, then that number is not valid.
The x should contain any number, but there should not be the same number..like 0000000 1111111 etc.
It can also have the following format (but not mandatory): (072)xxxxxxx
Is there any way of doing this?
I want to implement this to store these numbers in database and check if their format is Romanian.
This is the code where I need to add the regex expression...there should be a new Regex named "Romanian"
static IDictionary<string, Regex> countryRegex = new Dictionary<string, Regex>()
{
{ "USA", new Regex("^[2-9]\\d{2}-\\d{3}-\\d{4}$")},
{ "UK", new Regex("(^1300\\d{6}$)|(^1800|1900|1902\\d{6}$)|(^0[2|3|7|8]{1}[0-9]{8}$)|(^13\\d{4}$)|(^04\\d{2,3}\\d{6}$)")},
{ "Netherlands", new Regex("(^\\+[0-9]{2}|^\\+[0-9]{2}\\(0\\)|^\\(\\+[0-9]{2}\\)\\(0\\)|^00[0-9]{2}|^0)([0-9]{9}$|[0-9\\-\\s]{10}$)")},
};
If I understand the rules correctly, this pattern should work:
^(?<paren>\()?0(?:(?:72|74|75|76|77|78)(?(paren)\))(?<first>\d)(?!\k<first>{6})\d{6}|(?:251|351)(?(paren)\))(?<first>\d)(?!\k<first>{5})\d{5})$
So, you could add it to your code like this:
static IDictionary<string, Regex> countryRegex = new Dictionary<string, Regex>()
{
{ "USA", new Regex("^[2-9]\\d{2}-\\d{3}-\\d{4}$")},
{ "UK", new Regex("(^1300\\d{6}$)|(^1800|1900|1902\\d{6}$)|(^0[2|3|7|8]{1}[0-9]{8}$)|(^13\\d{4}$)|(^04\\d{2,3}\\d{6}$)")},
{ "Netherlands", new Regex("(^\\+[0-9]{2}|^\\+[0-9]{2}\\(0\\)|^\\(\\+[0-9]{2}\\)\\(0\\)|^00[0-9]{2}|^0)([0-9]{9}$|[0-9\\-\\s]{10}$)")},
{ "Romania", new RegEx(#"^(?<paren>\()?0(?:(?:72|74|75|76|77|78)(?(paren)\))(?<first>\d)(?!\k<first>{6})\d{6}|(?:251|351)(?(paren)\))(?<first>\d)(?!\k<first>{5})\d{5})$")}
};
Here is the meaning of the pattern:
^ - Matches must start at the beginning of the input string
(?<paren>\()? - Optionally matches a ( character. If it is there, it captures it in a group named paren
0 - The number must start with a single 0
(?: - Begins an non-capturing group for the purpose of matching one of two different formats
(?:72|74|75|76|77|78)(?(paren)\))(?<first>\d)(?!\k<first>{6})\d{6} - The first format
(?:72|74|75|76|77|78) - The next two digits must be 72, 74, 75, 76, 77, or 78
(?(paren)\)) - If the opening ( exists, then there must be a closing ) here
(?<first>\d) - Matches just the first of the ending seven digits and captures it in a group named first
(?!\k<first>{6}) - A negative look-ahead which ensures that the remaining six digits are not the same as the first one
\d{6} - Matches the remaining six digits
| - The or operator
(?:251|351)(?(paren)\))(?<first>\d)(?!\k<first>{5})\d{5} - The second format
(?:251|351) - The next three digits must be 251 or 351.
(?(paren)\)) - If the opening ( exists, then there must be a closing ) here
(?<first>\d) - Matches just the first of the ending six digits and captures it in a group named first
(?!\k<first>{5}) - A negative look-ahead which ensures that the remaining five digits are not the same as the first one
\d{5} - Matches the remaining five digits
) - Ends the non-capturing group which specified the two potential formats
$ - The match must go all the way to the of the input string
Try this one: ^(?=0[723][2-8]\d{7})(?!.*(.)\1{2,}).{10}$ - The negative lookahead (?!...) is testing the repeating characters
I use http://regexr.com/ to test
This match your example:
0(([7][456728])|([23]51)).*

Regex problems with equal sign?

In C# I'm trying to validate a string that looks like:
I#paramname='test'
or
O#paramname=2827
Here is my code:
string t1 = "I#parameter='test'";
string r = #"^([Ii]|[Oo])#\w=\w";
var re = new Regex(r);
If I take the "=\w" off the end or variable r I get True. If I add an "=\w" after the \w it's False. I want the characters between # and = to be able to be any alphanumeric value. Anything after the = sign can have alphanumeric and ' (single quotes). What am I doing wrong here. I very rarely have used regular expressions and normally can find example, this is custom format though and even with cheatsheets I'm having issues.
^([Ii]|[Oo])#\w+=(?<q>'?)[\w\d]+\k<q>$
Regular expression:
^ start of line
([Ii]|[Oo]) either (I or i) or (O or o)
\w+ 1 or more word characters
= equals sign
(?<q>'?) capture 0 or 1 quotes in named group q
[\w\d]+ 1 or more word or digit characters
\k<q> repeat of what was captured in named group q
$ end of line
use \w+ instead of \w to one character or more. Or \w* to get zero or more:
Try this: Live demo
^([Ii]|[Oo])#\w+=\'*\w+\'*
If you are being a bit more strict with using paramname:
^([Ii]|[Oo])#paramname=[']?[\w]+[']?
Here is a demo
You could try something like this:
Regex rx = new Regex( #"^([IO])#(\w+)=(.*)$" , RegexOptions.IgnoreCase ) ;
Match group 1 will give you the value of I or O (the parameter direction?)
Match group 2 will give you the name of the parameter
Match group 3 will give you the value of the parameter
You could be stricter about the 3rd group and match it as
(([^']+)|('(('')|([^']+))*'))
The first alternative matches 1 or more non quoted character; the second alternative match a quoted string literal with any internal (embedded) quotes escape by doubling them, so it would match things like
'' (the empty string
'foo bar'
'That''s All, Folks!'

Extracting one word based on special character using Regular Expression in C#

I am not very good at regular expression but want to do some thing like this :
string="c test123 d split"
I want to split the word based on "c" and "d". this can be any word which i already have. The string will be given by the user. i want "test123" and "split" as my output. and there can be any number of words i.e "c test123 d split e new" etc. c d e i have already with me. I want just the next word after that word i.e after c i have test123 and after d i have split and after e i have new so i need test123 and split and new. how can I do this??? And one more thing I will pass just c first than d and than e. not together all of them. I tried
string strSearchWord="c ";
Regex testRegex1 = new
Regex(strSearchWord);
List lstValues =
testRegex1.Split("c test123 d
split").ToList();
But it's working only for last character i.e for d it's giving the last word but for c it includes test123 d split.
How shall I do this???
The input might be
string strSearchWord="c mytest1 d newtest1 e lasttest1";
split should be based on characters "c d and e". I will pass them one by one.
or
string strSearchword="q 100 p 200 t 2000";
split should be based on characters "q p and t". I will pass them one by one.
or
string strSearchWord="t 100 r pass";
split should be based on characters "t r". I will pass them one by one.
or
string strSeaRCHwORD="fi 100 se 2000 td 500 ft 200 fv 6000 lt thanks ";
split should be based on characters "fi,se,td,ft,fv and lt". I will pass them one by one.
Hope it's clear. Any other specification????
string[] splitArray = null;
splitArray = Regex.Split(subjectString, #"\s*\b(c|d)\b\s*");
will split the string along the "words" c or d, whether or not they are surrounded by whitespace, but only if they occur as entire words (therefore the \b word boundary anchors).
This gives you all the substrings between your words as an array.
If you want to loop through the string manually, picking out each word after the search words one by one, you could use positive lookbehind:
string resultString = null;
resultString = Regex.Match(subjectString, #"(?<=\bc\b\s*)\w+").Value;
will find the word after c. Do the same for d ((?<=\bd\b\s*)\w+) etc.
This regex means:
(?<=\bc\b\s*): Assert that it is possible to match the "complete word" c, optionally followed by space characters, to the left of the current position in the string (positive lookbehind).
\w+: Then match any alphanumeric characters (including _) that follow.
use regex groups.
the regex would be
"c(.+?)d(.+?)"
and you would retrieve it as
Regex r = new Regex("c\s(.+?)\sd\s(.+?)"); // \s is whitespace
r.Match("c test123 d split").Groups[1] //is the 1st group "test123"
r.Match("c test123 d split").Groups[2] //is the 2nd group "split"
r.Match("c test123 d split").Groups[0] //is the whole match "c test123 d split

Categories