I need to format my telephone numbers in a specific way. Unfortunately business rules prohibit me from doing this up front. (separate input boxes etc..)
The format needs to be +1-xxx-xxx-xxxx where the "+1" is constant. (We don't do business internationally)
Here is my regex pattern to test the input:
"\\D*([2-9]\\d{2})(\\D*)([2-9]\\d{2})(\\D*)(\\d{4})\\D*"
(which I stole from somewhere else)
Then I perform a regex.Replace() like so:
regex.Replace(telephoneNumber, "+1-$1-$3-$5"); **THIS IS WHERE IT BLOWS UP**
If my telephone number already has the "+1" in the string, it prepends another so that I get +1-+1-xxx-xxx-xxxx
Can someone please help?
You can add (?:\+1\D*)? to catch an optional prefix before the number. As it's caught it will be replaced if it's there.
You don't need to use \D* before and after the number. As they are optional, they don't change anything.
You don't need to capture the parts that you won't use, that makes it easier to see what ends up in the replacement.
str = Regex.Replace(str, #"(?:\+1\D*)?([2-9]\d{2})\D*([2-9]\d{2})\D*(\d{4})", "+1-$1-$2-$3");
You might consider using something more specific than \D* for the separators though, for example [\- /]?. With a too non-specific pattern you risk catching something that's not a phone number, for example changing "I have 234 cats, 528 dogs and 4509 horses." into "I have +1-234-528-4509 horses.".
str = Regex.Replace(str, #"(?:\+1[\- /]?)?([2-9]\d{2})[\- /]?([2-9]\d{2})[\- /]?(\d{4})", "+1-$1-$2-$3");
try something like this to make things more readable:
Regex rxPhoneNumber = new Regex( #"
^ # anchor the start-of-match to start-of-text
\D* # - allow and ignore any leading non-digits
1? # - we'll allow (and ignore) a leading 1 (as in 1-800-123-4567
\D* # - allow and ignore any non-digits following that
(?<areaCode>[2-9]\d\d) # - required 3-digit area code
\D* # - allow and ignore any non-digits following the area code
(?<exchangeCode>[2-9]\d\d) # - required 3-digit exchange code (central office)
\D* # - allow and ignore any non-digits following the C.O.
(?<subscriberNumber>\d\d\d\d) # - required 4-digit subscriber number
\D* # - allow and ignore any non-digits following the subscriber number
$ # - followed the end-of-text.
" ,
RegexOptions.IgnorePatternWhitespace|RegexOptions.ExplicitCapture
);
string input = "voice: 1 (234) 567/1234 (leave a message)" ;
bool isValid = rxPhoneNumber.IsMatch(input) ;
string tidied = rxPhoneNumber.Replace( input , "+1-${areaCode}-${exchangeCode}-${subscriberNumber}" ) ;
which will give tidied the desired value
+1-234-567-1234
You can use the following regex
\D*(\+1-)?([2-9]\d{2})\D*([2-9]\d{2})\D*(\d{4})\D*
And the replacement string:
$1$2-$3-$4
Here is a demo
This is a kind of an adjustment of the regex you had. If you need to match the whole numbers, I'd use
(\+1-)?\b([2-9]\d{2})\D*([2-9]\d{2})\D*(\d{4})\b
See demo 2
Also, if the hyphen in \+1- is optional, add a ?: \+1-?.
To make the regex safer, I'd replace \D* (0 or more non-digit symbols) with some character class containing known separators, e.g [ /-]* (matching /, spaces and -s).
Related
I m programming a web application with asp.net mvc and c#.
In a form a user should enter a name, a streetname and a city in different fields.
Start: The entered value has to start with an 'alphabetic' character (no matter if the language is english, chinese or french or what ever. So things like é and chinese chars and so on are ok. But chars like *¥##1 and so on are not allowed)
Middle: The same as i said first and spaces (but not two spaces after eachother).
End: That what I have said for the start.
This is correct:
A b c
Abcd ef
Abcdef
This is not correct:
1abc
A1 bc
1 2 3
a b c (space at the start)
Question:
What is the correct regex for this?
How can i set the length?
In a second case I want to allow numbers 0123456789 too (like the chars)
This is what I have: '^[a-zA-Z][a-zA-Z ][a-zA-Z]$'
Thank you
You want to validate strings that only contain letter words separated with a single space between them.
You may use a regex like
^\p{L}+(?: \p{L}+)*$
Or, if any whitepsace is allowed:
^\p{L}+(?:\s\p{L}+)*$
See the regex demo
To make it only match strings of 3 or more chars, use
^(?=.{3})\p{L}+(?:\s\p{L}+)*$
^^^^^^^^
Details
^ - start of a string
(?=.{3}) = a positive lookahead that requires any 3 chars immediately after the start of a string
\p{L}+ - 1 or more any Unicode letters
(?:\s\p{L}+)* - zero or more repetitions of
\s - any whitespace
\p{L}+ - 1 or more any Unicode letters
$ - end of string
Note that if you need to use it in ASP.NET, only use this regex to validate on the server side, as on the client side, this pattern might not be correctly handled by JavaScript regex.
You can use this regex:
^(?:\p{L}+ )*\p{L}+$
\p{L} matches all unicode code points that are in the "Letters" category.
The regex matches 0 or more of \p{L}+ (one or more letters plus a space) and then ensures there is at least one or more letters.
Demo
Example code:
Console.WriteLine(Regex.IsMatch("abc def", #"^(?:\p{L}+ )*\p{L}+$"));
I already gone through many post on SO. I didn't find what I needed for my specific scenario.
I need a regex for alpha numeric string.
where following conditions should be matched
Valid string:
ameya123 (alphabets and numbers)
ameya (only alphabets)
AMeya12(Capital and normal alphabets and numbers)
Ameya_123 (alphabets and underscore and numbers)
Ameya_ 123 (alphabets underscore and white speces)
Invalid string:
123 (only numbers)
_ (only underscore)
(only space) (only white spaces)
any special charecter other than underscore
what i tried till now:
(?=.*[a-zA-Z])(?=.*[0-9]*[\s]*[_]*)
the above regex is working in Regex online editor however not working in data annotation in c#
please suggest.
Based on your requirements and not your attempt, what you are in need of is this:
^(?!(?:\d+|_+| +)$)[\w ]+$
The negative lookahead looks for undesired matches to fail the whole process. Those are strings containing digits only, underscores only or spaces only. If they never happen we want to have a match for ^[\w ]+$ which is nearly the same as ^[a-zA-Z0-9_ ]+$.
See live demo here
Explanation:
^ Start of line / string
(?! Start of negative lookahead
(?: Start of non-capturing group
\d+ Match digits
| Or
_+ Match underscores
| Or
[ ]+ Match spaces
)$ End of non-capturing group immediately followed by end of line / string (none of previous matches should be found)
) End of negative lookahead
[\w ]+$ Match a character inside the character set up to end of input string
Note: \w is a shorthand for [a-zA-Z0-9_] unless u modifier is set.
One problem with your regex is that in annotations, the regex must match and consume the entire string input, while your pattern only contains lookarounds that do not consume any text.
You may use
^(?!\d+$)(?![_\s]+$)[A-Za-z0-9\s_]+$
See the regex demo. Note that \w (when used for a server-side validation, and thus parsed with the .NET regex engine) will also allow any Unicode letters, digits and some more stuff when validating on the server side, so I'd rather stick to [A-Za-z0-9_] to be consistent with both server- and client-side validation.
Details
^ - start of string (not necessary here, but good to have when debugging)
(?!\d+$) - a negative lookahead that fails the match if the whole string consists of digits
(?![_\s]+$) - a negative lookahead that fails the match if the whole string consists of underscores and/or whitespaces. NOTE: if you plan to only disallow ____ or " " like inputs, you need to split this lookahead into (?!_+$) and (?!\s+$))
[A-Za-z0-9\s_]+ - 1+ ASCII letters, digits, _ and whitespace chars
$ - end of string (not necessary here, but still good to have).
If I understand your requirements correctly, you need to match one or more letters (uppercase or lowercase), and possibly zero or more of digits, whitespace, or underscore. This implies the following pattern:
^[A-Za-z0-9\s_]*[A-Za-z][A-Za-z0-9\s_]*$
Demo
In the demo, I have replaced \s with \t \r, because \s was matching across all lines.
Unlike the answers given by #revo and #wiktor, I don't have a fancy looking explanation to the regex. I am beautiful even without my makeup on. Honestly, if you don't understand the pattern I gave, you might want to review a good regex tutorial.
This simple RegEx should do it:
[a-zA-Z]+[0-9_ ]*
One or more Alphabet, followed by zero or more numbers, underscore and Space.
This one should be good:
[\w\s_]*[a-zA-Z]+[\w\s_]*
I am new to working with Regexs in C# .NET. Say I have a string as follows...
"Working on log #4"
And within this string we can expect to see the number (4) vary. How can I use a Regex to extract only that number from the string.
I want to make sure that the string matches the first part:
"Working on log #"
And then exctract the integer from it.
Also - I know that I could do this using string.Split(), or .Substring, etc. I just wanted to know how I might use regex's to do this.
Thanks!
"Working on log #(\d+)"
The () create a match group, so you will be able to extract that section.
The \d matches any digit.
The + says "look at the previous token, match it one or more times" so it will make it match one or more digits.
So overall you're capturing a group containing one or more digits, where that group comes after "Working on log #"
RegEx rgx = new RegEx("Working on log #[0-9]"); is the pattern you want to use. The first part is a string literal, [0-9] says that character can be any value 0 through 9. If you allow multiple digits then change it to [0-9]{x} where x is the number of repetitions or [0-9]+ as a + after any character means 1 or more of that character is allowed.
You could also just do string.StartsWith("Working on log #") then split on # and use int.TryParse() with the second value to confirm it is in fact a valid integer.
Try this: ^(?<=Working on log #)\d+$. This only captures the number. No need for a capture group. Remove ^ and $ if this is within a larger string.
^ - start of string
(?<=) - positive lookbehind - ensures what is between = and ) is found before
\d+ - at least one digit
$ - end of string
A capturing group is the solution:
"Working on log #(?<Number>[0-9]+)"
Then you can access the matched groups using the Match.Groups property.
I have one strange issue on my .NET project with RegEx. Please, see C# code below:
const string PATTERN = #"^[a-zA-Z]([-\s\.a-zA-Z]*('(?!'))?[-\s\.a-zA-Z]*)*$";
const string VALUE = "Ingebrigtsen Myre (Øvre)";
System.Text.RegularExpressions.Regex regex = new System.Text.RegularExpressions.Regex(PATTERN);
if (!regex.IsMatch(VALUE)) // <--- Infinite loop here
return string.Empty;
// Some other code
I use this pattern to validate all types of names (fist names, last names, middle names, etc.). Value is a parameter, but I provided it as a constant above, because issue is not reproduced often - only with special symbols: *, (, ), etc. (sorry, but I don't have the full list of these symbols).
Can you help me to fix this infinite loop? Thanks for any help.
Added: this code is placed on the very base level of project and I don't want to do any refactoring there - I just want to have quick fix for this issue.
Added 2: I do know that it technically is not a loop - I meant that "regex.IsMatch(VALUE)" never ends. I waited for about an hour and it was still executing.
Your non-trivial regex: ^[a-zA-Z]([-\s\.a-zA-Z]*('(?!'))?[-\s\.a-zA-Z]*)*$, is better written with comments in free-spacing mode like so:
Regex re_orig = new Regex(#"
^ # Anchor to start of string.
[a-zA-Z] # First char must be letter.
( # $1: Zero or more additional parts.
[-\s\.a-zA-Z]* # Zero or more valid name chars.
( # $2: optional quote.
' # Allow quote but only
(?!') # if not followed by quote.
)? # End $2: optional quote.
[-\s\.a-zA-Z]* # Zero or more valid name chars.
)* # End $1: Zero or more additional parts.
$ # Anchor to end of string.
",RegexOptions.IgnorePatternWhitespace);
In English, this regex essentially says: "Match a string that begins with an alpha letter [a-zA-Z] followed by zero or more alpha letters, whitespaces, periods, hyphens or single quotes, but each single quote may not be immediately followed by another single quote."
Note that your above regex allows oddball names such as: "ABC---...'... -.-.XYZ " which may or may not be what you need. It also allows multi-line input and strings that end with whitespace.
The "infinite loop" problem with the above regex is that catastrophic backtracking occurs when this regex is applied to a long invalid input which contains two single quotes in a row. Here is an equivalent pattern which matches (and fails to match) the exact same strings, but does not experience catastrophic backtracking:
Regex re_fixed = new Regex(#"
^ # Anchor to start of string.
[a-zA-Z] # First char must be letter.
[-\s.a-zA-Z]* # Zero or more valid name chars.
(?: # Zero or more isolated single quotes.
' # Allow single quote but only
(?!') # if not followed by single quote.
[-\s.a-zA-Z]* # Zero or more valid name chars.
)* # Zero or more isolated single quotes.
$ # Anchor to end of string.
",RegexOptions.IgnorePatternWhitespace);
And here it is in short form in your code context:
const string PATTERN = #"^[a-zA-Z][-\s.a-zA-Z]*(?:'(?!')[-\s.a-zA-Z]*)*$";
Look at this part of your regex:
( [-\s\.a-zA-Z]* ('(?!'))? [-\s\.a-zA-Z]* )*$
^ ^ ^ ^ ^
| | | | |
| | | | This group repeats any number of times
| | | charclass repeats any number of times
| | This group is optional
| This character class also repeats any number of times
Outer group (repeated, as seen above)
That means that as soon as your input string contains a character that's not in the character class (like the brackets and non-ASCII letter in your example), the preceding characters will be tried in a lot of permutations whose number increases exponentially with the length of the string.
To avoid that (and to allow a faster failure of the regex, use atomic groups:
const string PATTERN = #"^[a-zA-Z](?>(?>[-\s\.a-zA-Z]*)(?>'(?!'))?(?>[-\s\.a-zA-Z])*)*$";
You've got an "any number of any number" here:
...[-\s\.a-zA-Z]*)*
and because your input doesn't match, the engine backtracks to try all permutations of dividing the input up, and the number of attempts grows exponentially with the length of the input.
You can fix it simply by adding a "+" to make a possessive quantifier, which once consumed will not backtrack to find other combinations:
const string PATTERN = #"^[a-zA-Z]([-\s\.a-zA-Z]*('(?!'))?[-\s\.a-zA-Z]*+)*$";
^-- added + here
You can see a live demo (on rubular) demonstrating that adding the plus fixed the loop problem, and still matches input that doesn't have the odd characters.
Guys I hate Regex and I suck at writing.
I have a string that is space separated and contains several codes that I need to pull out. Each code is marked by beginning with a capital letter and ending with a number. The code is only two digits.
I'm trying to create an array of strings from the initial string and I can't get the regular expression right.
Here is what I have
String[] test = Regex.Split(originalText, "([a-zA-Z0-9]{2})");
I also tried:
String[] test = Regex.Split(originalText, "([A-Z]{1}[0-9]{1})");
I don't have any experience with Regex as I try to avoid writing them whenever possible.
Anyone have any suggestions?
Example input:
AA2410 F7 A4 Y7 B7 A 0715 0836 E0.M80
I need to pull out F7, A4, B7. E0 should be ignored.
You want to collect the results, not split on them, right?
Regex regexObj = new Regex(#"\b[A-Z][0-9]\b");
allMatchResults = regexObj.Matches(subjectString);
should do this. The \bs are word boundaries, making sure that only entire strings (like A1) are extracted, not substrings (like the A1 in TWA101).
If you also need to exclude "words" with non-word characters in them (like E0.M80 in your comment), you need to define your own word boundary, for example:
Regex regexObj = new Regex(#"(?<=^|\s)[A-Z][0-9](?=\s|$)");
Now A1 only matches when surrounded by whitespace (or start/end-of-string positions).
Explanation:
(?<= # Assert that we can match the following before the current position:
^ # Start of string
| # or
\s # whitespace.
)
[A-Z] # Match an uppercase ASCII letter
[0-9] # Match an ASCII digit
(?= # Assert that we can match the following after the current position:
\s # Whitespace
| # or
$ # end of string.
)
If you also need to find non-ASCII letters/digits, you can use
\p{Lu}\p{N}
instead of [A-Z][0-9]. This finds all uppercase Unicode letters and Unicode digits (like Ä٣), but I guess that's not really what you're after, is it?
Do you mean that each code looks like "A00"?
Then this is the regex:
"[A-Z][0-9][0-9]"
Very simple... By the way, there's no point writing {1} in a regex. [0-9]{1} means "match exactly one digit, which is exactly like writing [0-9].
Don't give up, simple regexes make perfect sense.
This should be ok:
String[] all_codes = Regex.Split(originalText, #"\b[A-Z]\d\b");
It gives you an array with all code starting with a capital letter followed by a digit, separated by an kind of word boundary (site space etc.)