Underscore in regex not validating

Underscore in regex not validating - c#

How do I add underscore as a part of my regex string.
Here is my string that checks for uppercase, lowercase, numbers and special characters. The rest of the special characters work. Validation isn't working for underscores.
#"^[^\s](?=(.*[A-Za-z]){1,})(?=(.*[\d]){1,})(?=(.*[\W]){1,})(?=(.*[!##$%^&*()-+=\[{\]};:<>|_.\\/?,\-`'""~]{1,})).*[^\s]$"
Any ideas?
Thanks

This is the regex that AWS Cogito uses, it should apply to your situation:
#"^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*[\^$*.\[\]{}\(\)?\-“!##%&\/,><’:;|_~`])\S{8,99}$"
You can check regexes at http://regexstorm.net, it's faster than building your application everytime.

I've approached it like this: I took your requirements and made them into separate positive lookaheads:
Check for:
uppercase (?=.*[A-Z])
lowercase (?=.*[a-z]) (note that I broke A-Z and a-z up into separate groups)
numbers (?=.*\d)
special characters (?=.*[!##$%^&*()-+=\[{\]};:<>|_.\\/?,\-`'""~])
You can then combine them in any order and I've combined them in the same order as I listed them above and anchored it with the beginning of the line using ^. Don't add any extra matches before, in-between or after the groups in your requirement that could cause the regex to enforce a certain ordering of the groups:
The lookahead for any non-word character \W makes it impossible to match Underscore1_ since it will only match on "anything other than a letter, digit or underscore" - which is all Underscore1_ contains.
The starting [^\s] (and ending [^\s]) that consumes one character is likely destroying a lot of good matches. Underscore1_ or _1scoreUnder shouldn't matter, but if you start with _ and consume it with [^\s] like you do, the later lookahead for a special character will fail (unless you have a second special character in the password).
#"^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[!##$%^&*()-+=\[{\]};:<>|_.\\/?,\-`'""~])"
If you have a minimum length requirement of, say, 7 characters, you just have to add .{7,}$ to the end of the regex, making it:
#"^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[!##$%^&*()-+=\[{\]};:<>|_.\\/?,\-`'""~]).{7,}$"
Without a minimum length, a password of one character from each group will be enough, and since there are 4 groups, a password with only 4 characters will pass the filter.
I see no point in putting an upper length limit into the regex. If the user interface has accepted a string that is thousands of characters long, then why reject it for being too long later? The length of what you store is probably going to be much smaller anyway since you'll be storing the bcrypt/scrypt/argon2/... encoded password.
Suggestion: Also add space (or even whitespaces) to the list of special characters.

In you regexp add underscore in 3rd Capturing Group regex101
#"^[^\s](?=(.*[A-Za-z]){1,})(?=(.*[\d]){1,})(?=(.*[\W_]){1,})(?=(.*[!##$%^&*()-+=\[{\]};:<>|_.\\/?,\-`'""~]{1,})).*[^\s]$"

Related

Update regular expression to only allow single spaces [duplicate]

Right now I have a regex that prevents the user from typing any special characters. The only allowed characters are A through Z, 0 through 9 or spaces.
I want to improve this regex to prevent the following:
No leading/training spaces - If the user types one or more spaces before or after the entry, do not allow.
No double-spaces - If the user types the space key more than once, do not allow.
The Regex I have right now to prevent special characters is as follows and appears to work just fine, which is:
^[a-zA-Z0-9 ]+$
Following some other ideas, I tried all these options but they did not work:
^\A\s+[a-zA-Z0-9 ]+$\A\s+
/s*^[a-zA-Z0-9 ]+$/s*
Could I get a helping hand with this code? Again, I just want letters A-Z, numbers 0-9, and no leading or trailing spaces.
Thanks.

You can use the following regex:
^[a-zA-Z0-9]+(?: [a-zA-Z0-9]+)*$
See regex demo.
The regex will match alphanumerics at the start (1 or more) and then zero or more chunks of a single space followed with one or more alphanumerics.
As an alternative, here is a regex based on lookaheads (but is thus less efficient):
^(?!.* {2})(?=\S)(?=.*\S$)[a-zA-Z0-9 ]+$
See the regex demo
The (?!.* {2}) disallows consecutive spaces and (?=.*\S$) requires a non-whitespace to be at the end of the string and (?=\S) requires it at the start.

C# Regex boundary with special characters

I want to have a Regex that finds "Attributable".
I tried #"\bAttributable\b" but the \b boundary doesn't work with special characters.
For example, it wouldn't differentiate Attributable and Non-Attributable. Is there any way to Regex for Attributable and not it's negative?

Do a negative look-behind?
(?<!-)\bAttributable\b
Obviously this only checks for -s. If you want to check for other characters, put them in a character class in the negative look-behind:
(?<![-^])\bAttributable\b
Alternatively, if you just want to not match Non-Attributable but do match SomethingElse-Attributable, then put Non- in the look-behind:
(?<!Non-)\bAttributable\b

There are several ways to fix the issue like you have but it all depends on the real requirements. It is sometimes necessary to precise what "word boundary" you need in each concrete case, since \b word boundary is 1) context dependent, and 2) matches specific places in the string that you should be aware of:
Before the first character in the string, if the first character is a
word character.
After the last character in the string, if the last
character is a word character.
Between two characters in the string,
where one is a word character and the other is not a word character.
Now, here are several approaches that you may follow:
When you only care about compound words usually joined with hyphens (similar #Sweeper's answer): (?<!-)\bAttributable\b(?!-)
Only match between whitespaces or start/end of string: (?<!\S)Attributable(?!\S). NOTE: Actually, if it is what you want, you may do without a regex by using s.Split().Contains("Attributable")
Only match if not preceded with punctuation and there is no letter/digit/underscore right after: (?<!\p{P})Attributable\b
Only match if not preceded with punctation symbols but some specific ones (say, you want to match the word after a comma and a colon): (?<![^\P{P},;])Attributable\b.

Explain the Regex mentioned

Can any one please explain the regex below, this has been used in my application for a very long time even before I joined, and I am very new to regex's.
/^.*(?=.{6,10})(?=.*[a-zA-Z].*[a-zA-Z].*[a-zA-Z].*[a-zA-Z])(?=.*\d.*\d).*$/
As far as I understand
this regex will validate
- for a minimum of 6 chars to a maximum of 10 characters
- will escape the characters like ^ and $
also, my basic need is that I want a regex for a minimum of 6 characters with 1 character being a digit and the other one being a special character.

^.*(?=.{6,10})(?=.*[a-zA-Z].*[a-zA-Z].*[a-zA-Z].*[a-zA-Z])(?=.*\d.*\d).*$
^ is called an "anchor". It basically means that any following text must be immediately after the "start of the input". So ^B would match "B" but not "AB" because in the second "B" is not the first character.
.* matches 0 or more characters - any character except a newline (by default). This is what's known as a greedy quantifier - the regex engine will match ("consume") all of the characters to the end of the input (or the end of the line) and then work backwards for the rest of the expression (it "gives up" characters only when it must). In a regex, once a character is "matched" no other part of the expression can "match" it again (except for zero-width lookarounds, which is coming next).
(?=.{6,10}) is a lookahead anchor and it matches a position in the input. It finds a place in the input where there are 6 to 10 characters following, but it does not "consume" those characters, meaning that the following expressions are free to match them.
(?=.*[a-zA-Z].*[a-zA-Z].*[a-zA-Z].*[a-zA-Z]) is another lookahead anchor. It matches a position in the input where the following text contains four letters ([a-zA-Z] matches one lowercase or uppercase letter), but any number of other characters (including zero characters) may be between them. For example: "++a5b---C#D" would match. Again, being an anchor, it does not actually "consume" the matched characters - it only finds a position in the text where the following characters match the expression.
(?=.*\d.*\d) Another lookahead. This matches a position where two numbers follow (with any number of other characters in between).
.* Already covered this one.
$ This is another kind of anchor that matches the end of the input (or the end of a line - the position just before a newline character). It says that the preceding expression must match characters at the end of the string. When ^ and $ are used together, it means that the entire input must be matched (not just part of it). So /bcd/ would match "abcde", but /^bcd$/ would not match "abcde" because "a" and "e" could not be included in the match.
NOTE
This looks like a password validation regex. If it is, please note that it's broken. The .* at the beginning and end will allow the password to be arbitrarily longer than 10 characters. It could also be rewritten to be a bit shorter. I believe the following will be an acceptable (and slightly more readable) substitute:
^(?=(.*[a-zA-Z]){4})(?=(.*\d){2}).{6,10}$
Thanks to #nhahtdh for pointing out the correct way to implement the character length limit.

Check Cyborgx37's answer for the syntax explanation. I'll do some explanation on the meaning of the regex.
^.*(?=.{6,10})(?=.*[a-zA-Z].*[a-zA-Z].*[a-zA-Z].*[a-zA-Z])(?=.*\d.*\d).*$
The first .* is redundant, since the rest are zero-width assertions that begins with any character ., and .* at the end.
The regex will match minimum 6 characters, due to the assertion (?=.{6,10}). However, there is no upper limit on the number of characters of the string that the regex can match. This is because of the .* at the end (the .* in the front also contributes).
This (?=.*[a-zA-Z].*[a-zA-Z].*[a-zA-Z].*[a-zA-Z]) part asserts that there are at least 4 English alphabet character (uppercase or lowercase). And (?=.*\d.*\d) asserts that there are at least 2 digits (0-9). Since [a-zA-Z] and \d are disjoint sets, these 2 conditions combined makes the (?=.{6,10}) redundant.
The syntax of .*[a-zA-Z].*[a-zA-Z].*[a-zA-Z].*[a-zA-Z] is also needlessly verbose. It can be shorten with the use of repetition: (?:.*[a-zA-Z]){4}.
The following regex is equivalent your original regex. However, I really doubt your current one and this equivalent rewrite of your regex does what you want:
^(?=(?:.*[a-zA-Z]){4})(?=(?:.*\d){2}).*$
More explicit on the length, since clarity is always better. Meaning stay the same:
^(?=(?:.*[a-zA-Z]){4})(?=(?:.*\d){2}).{6,}$
Recap:
Minimum length = 6
No limit on maximum length
At least 4 English alphabet, lowercase or uppercase
At least 2 digits 0-9

REGEXPLANATION
/.../: slashes are often used to represent the area where the regex is defined
^: matches beginning of input string
.: this can match any character
*: matches the previous symbol 0 or more times
.{6,10}: matches .(any character) somewhere between 6 and 10 times
[a-zA-Z]: matches all characters between a and z and between A and Z
\d: matches a digit.
$: matches the end of input.
I think that just about does it for all the symbols in the regex you've posted

For your regex request, here is what you would use:
^(?=.{6,}$)(?=.*?\d)(?=.*?[!##$%&*()+_=?\^-]).*
And here it is unrolled for you:
^ // Anchor the beginning of the string (password).
(?=.{6,}$) // Look ahead: Six or more characters, then the end of the string.
(?=.*?\d) // Look ahead: Anything, then a single digit.
(?=.*?[!##$%&*()+_=?\^-]) // Look ahead: Anything, and a special character.
.* // Passes our look aheads, let's consume the entire string.
As you can see, the special characters have to be explicitly defined as there is not a reserved shorthand notation (like \w, \s, \d) for them. Here are the accepted ones (you can modify as you wish):
!, #, #, $, %, ^, &, *, (, ), -, +, _, =, ?
The key to understanding regex look aheads is to remember that they do not move the position of the parser. Meaning that (?=...) will start looking at the first character after the last pattern match, as will subsequent (?=...) look aheads.

Regex to match two or more consecutive characters

Using regular expressions I want to match a word which
starts with a letter
has english alpahbets
numbers, period(.), hyphen(-), underscore(_)
should not have two or more consecutive periods or hyphens or underscores
can have multiple periods or hyphens or underscore
For example,
flin..stones or flin__stones or flin--stones
are not allowed.
fl_i_stones or fli_st.ones or flin.stones or flinstones
is allowed .
So far My regular expression is ^[a-zA-Z][a-zA-Z\d._-]+$
So My question is how to do it using regular expression

You can use a lookahead and a backreference to solve this. But note that right now you are requiring at least 2 characters. The starting letter and another one (due to the +). You probably want to make that + and * so that the second character class can be repeated 0 or more times:
^(?!.*(.)\1)[a-zA-Z][a-zA-Z\d._-]*$
How does the lookahead work? Firstly, it's a negative lookahead. If the pattern inside finds a match, the lookahead causes the entire pattern to fail and vice-versa. So we can have a pattern inside that matches if we do have two consecutive characters. First, we look for an arbitrary position in the string (.*), then we match single (arbitrary) character (.) and capture it with the parentheses. Hence, that one character goes into capturing group 1. And then we require this capturing group to be followed by itself (referencing it with \1). So the inner pattern will try at every single position in the string (due to backtracking) whether there is a character that is followed by itself. If these two consecutive characters are found, the pattern will fail. If they cannot be found, the engine jumps back to where the lookahead started (the beginning of the string) and continue with matching the actual pattern.
Alternatively you can split this up into two separate checks. One for valid characters and the starting letter:
^[a-zA-Z][a-zA-Z\d._-]*$
And one for the consecutive characters (where you can invert the match result):
(.)\1
This would greatly increase the readability of your code (because it's less obscure than that lookahead) and it would also allow you to detect the actual problem in pattern and return an appropriate and helpful error message.

Is my C# Reg-ex correct?

Is this Regex correct if I have to match a string which is atleast 7 characters long, not more than 20 characters, has atleast 1 number, and atleast 1 letter? It has no other constraints.
[0-9]+[A-Za-z]+{7,20}
Thanks

No, it's not. The quantifier {7,20} doesn't apply to a token (repetition in regexes is done with quantifiers, like *, +, ? or the more general {n,m} – you cannot use more than one quantifier on a single token [in this case [a-zA-Z]]; *? is a quantifier on its own and thus doesn't play by above rules). You'll need something like the following:
^(?=.*\d)(?=.*[a-zA-Z]).{7,20}$
This has two lookaheads making sure of at least one digit and at least one letter:
(?=.*\d)
(?=.*[a-zA-Z])
Lookarounds are zero-width assertions; they do not consume characters in the string so they are merely matching a position. But they make sure that the expression inside of them would match at the current point. In this case this expression would match arbitrarily many characters and then would require a digit or a letter, respectively.
The actual match itself,
.{7,20}
just makes sure the length matches. What characters are used is irrelevant because we made sure of that constraints above already.
Finally the whole expression is anchored in that a start-of-string and end-of-string anchor are inserted at the start and end:
^...$
This makes sure that the match really encompasses the whole string. While not strictly necessary in this case (it would match the whole string anyway in all valid cases) it's often a good idea to include because usually regexes match only substrings and this can lead to subtle problems where validation regexes match even though they should fail. E.g. using \d+ to make sure a string consists only of digits would match the string a4b which puzzles beginners quite often.
I also changed that the order of letters and numbers doesn't matter. Your regex looks like it tries to impose a definite order where all numbers need to come before all letters which usually isn't what's wanted here.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Underscore in regex not validating - c#

This is the regex that AWS Cogito uses, it should apply to your situation: #"^(?=.[a-z])(?=.[A-Z])(?=.[0-9])(?=.[\^$*.\[\]{}\(\)?\-“!##%&\/,><’:;|_~`])\S{8,99}$" You can check regexes at http://regexstorm.net, it's faster than building your application everytime.

In you regexp add underscore in 3rd Capturing Group regex101 #"^[^\s](?=(.[A-Za-z]){1,})(?=(.[\d]){1,})(?=(.[\W_]){1,})(?=(.[!##$%^&()-+=\[{\]};:<>|_.\\/?,\-`'""~]{1,})).[^\s]$"

Related

Update regular expression to only allow single spaces [duplicate]

C# Regex boundary with special characters

Explain the Regex mentioned

Regex to match two or more consecutive characters

Is my C# Reg-ex correct?

Categories

Resources

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Underscore in regex not validating - c#

This is the regex that AWS Cogito uses, it should apply to your situation: #"^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*[\^$*.\[\]{}\(\)?\-“!##%&\/,><’:;|_~`])\S{8,99}$" You can check regexes at http://regexstorm.net, it's faster than building your application everytime.

In you regexp add underscore in 3rd Capturing Group regex101 #"^[^\s](?=(.*[A-Za-z]){1,})(?=(.*[\d]){1,})(?=(.*[\W_]){1,})(?=(.*[!##$%^&*()-+=\[{\]};:<>|_.\\/?,\-`'""~]{1,})).*[^\s]$"

Related

Update regular expression to only allow single spaces [duplicate]

C# Regex boundary with special characters

Explain the Regex mentioned

Regex to match two or more consecutive characters

Is my C# Reg-ex correct?

Categories

Resources

This is the regex that AWS Cogito uses, it should apply to your situation: #"^(?=.[a-z])(?=.[A-Z])(?=.[0-9])(?=.[\^$*.\[\]{}\(\)?\-“!##%&\/,><’:;|_~`])\S{8,99}$" You can check regexes at http://regexstorm.net, it's faster than building your application everytime.

In you regexp add underscore in 3rd Capturing Group regex101 #"^[^\s](?=(.[A-Za-z]){1,})(?=(.[\d]){1,})(?=(.[\W_]){1,})(?=(.[!##$%^&()-+=\[{\]};:<>|_.\\/?,\-`'""~]{1,})).[^\s]$"