Is my C# Reg-ex correct? - c#

Is this Regex correct if I have to match a string which is atleast 7 characters long, not more than 20 characters, has atleast 1 number, and atleast 1 letter? It has no other constraints.
[0-9]+[A-Za-z]+{7,20}
Thanks

No, it's not. The quantifier {7,20} doesn't apply to a token (repetition in regexes is done with quantifiers, like *, +, ? or the more general {n,m} – you cannot use more than one quantifier on a single token [in this case [a-zA-Z]]; *? is a quantifier on its own and thus doesn't play by above rules). You'll need something like the following:
^(?=.*\d)(?=.*[a-zA-Z]).{7,20}$
This has two lookaheads making sure of at least one digit and at least one letter:
(?=.*\d)
(?=.*[a-zA-Z])
Lookarounds are zero-width assertions; they do not consume characters in the string so they are merely matching a position. But they make sure that the expression inside of them would match at the current point. In this case this expression would match arbitrarily many characters and then would require a digit or a letter, respectively.
The actual match itself,
.{7,20}
just makes sure the length matches. What characters are used is irrelevant because we made sure of that constraints above already.
Finally the whole expression is anchored in that a start-of-string and end-of-string anchor are inserted at the start and end:
^...$
This makes sure that the match really encompasses the whole string. While not strictly necessary in this case (it would match the whole string anyway in all valid cases) it's often a good idea to include because usually regexes match only substrings and this can lead to subtle problems where validation regexes match even though they should fail. E.g. using \d+ to make sure a string consists only of digits would match the string a4b which puzzles beginners quite often.
I also changed that the order of letters and numbers doesn't matter. Your regex looks like it tries to impose a definite order where all numbers need to come before all letters which usually isn't what's wanted here.

Related

Underscore in regex not validating

How do I add underscore as a part of my regex string.
Here is my string that checks for uppercase, lowercase, numbers and special characters. The rest of the special characters work. Validation isn't working for underscores.
#"^[^\s](?=(.*[A-Za-z]){1,})(?=(.*[\d]){1,})(?=(.*[\W]){1,})(?=(.*[!##$%^&*()-+=\[{\]};:<>|_.\\/?,\-`'""~]{1,})).*[^\s]$"
Any ideas?
Thanks
This is the regex that AWS Cogito uses, it should apply to your situation:
#"^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*[\^$*.\[\]{}\(\)?\-“!##%&\/,><’:;|_~`])\S{8,99}$"
You can check regexes at http://regexstorm.net, it's faster than building your application everytime.
I've approached it like this: I took your requirements and made them into separate positive lookaheads:
Check for:
uppercase (?=.*[A-Z])
lowercase (?=.*[a-z]) (note that I broke A-Z and a-z up into separate groups)
numbers (?=.*\d)
special characters (?=.*[!##$%^&*()-+=\[{\]};:<>|_.\\/?,\-`'""~])
You can then combine them in any order and I've combined them in the same order as I listed them above and anchored it with the beginning of the line using ^. Don't add any extra matches before, in-between or after the groups in your requirement that could cause the regex to enforce a certain ordering of the groups:
The lookahead for any non-word character \W makes it impossible to match Underscore1_ since it will only match on "anything other than a letter, digit or underscore" - which is all Underscore1_ contains.
The starting [^\s] (and ending [^\s]) that consumes one character is likely destroying a lot of good matches. Underscore1_ or _1scoreUnder shouldn't matter, but if you start with _ and consume it with [^\s] like you do, the later lookahead for a special character will fail (unless you have a second special character in the password).
#"^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[!##$%^&*()-+=\[{\]};:<>|_.\\/?,\-`'""~])"
If you have a minimum length requirement of, say, 7 characters, you just have to add .{7,}$ to the end of the regex, making it:
#"^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[!##$%^&*()-+=\[{\]};:<>|_.\\/?,\-`'""~]).{7,}$"
Without a minimum length, a password of one character from each group will be enough, and since there are 4 groups, a password with only 4 characters will pass the filter.
I see no point in putting an upper length limit into the regex. If the user interface has accepted a string that is thousands of characters long, then why reject it for being too long later? The length of what you store is probably going to be much smaller anyway since you'll be storing the bcrypt/scrypt/argon2/... encoded password.
Suggestion: Also add space (or even whitespaces) to the list of special characters.
In you regexp add underscore in 3rd Capturing Group regex101
#"^[^\s](?=(.*[A-Za-z]){1,})(?=(.*[\d]){1,})(?=(.*[\W_]){1,})(?=(.*[!##$%^&*()-+=\[{\]};:<>|_.\\/?,\-`'""~]{1,})).*[^\s]$"

Using RegEx, what's the best way to capture groups of digits, ignoring any whitespace in them

Given the following string...
ABC DEF GHI: 319 022 6543 QRS : 531 450
I'm trying to extract all ranges that start/end with a digit, and which may contain whitespace, but I want that whitespace itself removed.
For instance, the above should yield two results (since there are two 'ranges' that match what I aim looking for)...
3190226543
531450
My first thought was this, but this matches the spaces between the letters...
([\d\s])
Then I tried this, but it didn't seem to have any effect...
([\d+\s*])
This one comes close, but its grabbing the trailing spaces too. Also, this grabs the whitespace, but doesn't remove it.
(\d[\d\s]+)
If it's impossible to remove the spaces in a single statement, I can always post-process the groups if I can properly extract them. That most recent statement comes close, but how do I say it doesn't end with whitespace, but only a digit?
So what's the missing expression? Also, since sometimes people just post an answer, it would be helpful to explain out the RegEx too to help others figure out how to do this. I for one would love not just the solution, but an explanation. :)
Note: I know there can be some variations between RegEx on different platforms so that's fine if those differences are left up to the reader. I'm more interested in understanding the basic mechanics of the regex itself more so than the syntax. That said, if it helps, I'm using both Swift and C#.
You cannot get rid of whitespace from inside the match value within a single match operation. You will need to remove spaces as a post-processing step.
To match a string that starts with a digit and then optionally contains any amount of digits or whitespaces and then a digit you can use
\d(?:[\d\s]*\d)?
Details:
\d - a digit
(?:[\d\s]*\d)? - an optional non-capturing group matching
[\d\s]* - zero or more whitespaces / digits
\d - a digit.
See the regex demo.

Regex to match two or more consecutive characters

Using regular expressions I want to match a word which
starts with a letter
has english alpahbets
numbers, period(.), hyphen(-), underscore(_)
should not have two or more consecutive periods or hyphens or underscores
can have multiple periods or hyphens or underscore
For example,
flin..stones or flin__stones or flin--stones
are not allowed.
fl_i_stones or fli_st.ones or flin.stones or flinstones
is allowed .
So far My regular expression is ^[a-zA-Z][a-zA-Z\d._-]+$
So My question is how to do it using regular expression
You can use a lookahead and a backreference to solve this. But note that right now you are requiring at least 2 characters. The starting letter and another one (due to the +). You probably want to make that + and * so that the second character class can be repeated 0 or more times:
^(?!.*(.)\1)[a-zA-Z][a-zA-Z\d._-]*$
How does the lookahead work? Firstly, it's a negative lookahead. If the pattern inside finds a match, the lookahead causes the entire pattern to fail and vice-versa. So we can have a pattern inside that matches if we do have two consecutive characters. First, we look for an arbitrary position in the string (.*), then we match single (arbitrary) character (.) and capture it with the parentheses. Hence, that one character goes into capturing group 1. And then we require this capturing group to be followed by itself (referencing it with \1). So the inner pattern will try at every single position in the string (due to backtracking) whether there is a character that is followed by itself. If these two consecutive characters are found, the pattern will fail. If they cannot be found, the engine jumps back to where the lookahead started (the beginning of the string) and continue with matching the actual pattern.
Alternatively you can split this up into two separate checks. One for valid characters and the starting letter:
^[a-zA-Z][a-zA-Z\d._-]*$
And one for the consecutive characters (where you can invert the match result):
(.)\1
This would greatly increase the readability of your code (because it's less obscure than that lookahead) and it would also allow you to detect the actual problem in pattern and return an appropriate and helpful error message.

C# Regex Validation

Can someone please validate this for me (newbie of regex match cons).
Rather than asking the question, I am writing this:
Regex rgx = new Regex (#"^{3}[a-zA-Z0-9](\d{5})|{3}[a-zA-Z0-9](\d{9})$"
Can someone telll me if it's OK...
The accounts I am trying to match are either of:
1. BAA89345 (8 chars)
2. 12345678 (8 chars)
3. 123456789112 (12 chars)
Thanks in advance.
You can use a Regex tester. Plenty of free ones online. My Regex Tester is my current favorite.
Is the value with 3 characters then followed by digits always starting with three... can it start with less than or more than three. What are these mins and max chars prior to the digits if they can be.
You need to place your quantifiers after the characters they are supposed to quantify. Also, character classes need to be wrapped in square brackets. This should work:
#"^(?:[a-zA-Z0-9]{3}|\d{3}\d{4})\d{5}$"
There are several good, automated regex testers out there. You may want to check out regexpal.
Although that may be a perfectly valid match, I would suggest rewriting it as:
^([a-zA-Z]{3}\d{5}|\d{8}|\d{12})$
which requires the string to match one of:
[a-zA-Z]{3}\d{5} three alpha and five numbers
\d{8} 8 digits or
\d{12} twelve digits.
Makes it easier to read, too...
I'm not 100% on your objective, but there are a few problems I can see right off the bat.
When you list the acceptable characters to match, like with a-zA-Z0-9, you need to put it inside brackets, like [a-zA-Z0-9] Using a ^ at the beginning will negate the contained characters, e.g. `[^a-zA-Z0-9]
Word characters can be matched like \w, which is equivalent to [a-zA-Z0-9_].
Quantifiers need to appear at the end of the match expression. So, instead of {3}[a-zA-Z0-9], you would need to write [a-zA-Z0-9]{3} (assuming you want to match three instances of a character that matches [a-zA-Z0-9]

I need a Regular Expression allowing user to input numbers, plus, minus and parentheses

I need a Regular Expression allowing user to input numbers, plus, minus and parentheses.
User can only input:
At most one open parenthesis '('.
At most one close parenthesis ')'.
At most one plus '+'
As many minus '-' but not after each other.
Exactly 11 numbers.
Here are valid inputs:
(0)+12-3-4-56-7890
+)0(12345-678-90
+01234567890
+(01234567890)
01234567890
-01-234+5678-90
(01234567890)
)01234567890(
And following are not valid:
0123456--7890
0((1234567890
01234567890))
++01234567890
123456
++123456789
I'm using C# for programming and if it helps order of open and close parentheses can become mandatory too. so )01234567890( will not be valid.
Thanks in advance
This regex passes your examples, but might not be exactly what you're looking for. It should point you in the right direction.
^(?!.*-{2,})(?!(?:.*\)){2,})(?!(?:.*\(){2,})(?!\+{2,})(?:\D*\d\D*){11}$
(?!.*-{2,}) Cannot contain two or more hyphens.
(?!(?:.*)){2,}) Cannot contain two or more closing parentheses.
(?!(?:.*(){2,}) Cannot contain two or more opening parentheses.
(?!+{2,}) Cannot start with more than two addition symbols.
(?:\D*\d\D*){11} Must contain 11 instances of a numeric character surrounded by anything.
However, this is very confusing and fairly inefficient. I bet the regex could be rewritten to be much quicker, but won't be much easier to understand.
I suggest that you follow MisterJack's suggestion instead of pursue a regex. It'll be easier to maintain.
EDIT
^(?!.*--)(?!.*(\(|\)|\+).*\1)(?:\D*\d\D*){11}$
I've consolidated the parentheses and plus symbol rules into one negative lookahead using a backreference. This also restricts the number of parens and pluses to just one of each. I couldn't get it to restrict to just a certain set of characters, but you might be able to do that in a second pass with another regex.
^ Match from beginning of the string
(?!.*--) Do not allow consecutive hyphens
(?!.* ((|)|+).*\1) Do not allow two or more instances of () or +
(?:\D*\d\D*){11} Must contain 11 digits, allow non-digit characters before and after, such as hyphen.
$ Match to end of string
I tried a negative and positive lookahead to restrict the characters, but couldn't get it to work right. I also tried to replace \D with [()+-] but that didn't work either. Maybe someone else will add a comment to show how to restrict the characters. I'd sure love to see how someone else does it in this regex.
I think that a regular expression isn't your best bet, because it could become too much complicated and it can easily be broken.
What I suggest you is to try to parse your input, i.e. to count how many numbers, minuses, plus and parenthesis the user entered, and if they appear in the right order. An easy way to do this could be to loop over the characters that compose the string and check if the current char:
is a number (and we keep count of how many numbers we found)
is a minus (and the previous char isn't a minus)
is a plus (and it is the first one)
is a parenthesis (it's the first open parenthesis or it's a closed one and we already found the open parenthesis)
This could do the trick.

Categories