Regex to match two or more consecutive characters - c#

Using regular expressions I want to match a word which
starts with a letter
has english alpahbets
numbers, period(.), hyphen(-), underscore(_)
should not have two or more consecutive periods or hyphens or underscores
can have multiple periods or hyphens or underscore
For example,
flin..stones or flin__stones or flin--stones
are not allowed.
fl_i_stones or fli_st.ones or flin.stones or flinstones
is allowed .
So far My regular expression is ^[a-zA-Z][a-zA-Z\d._-]+$
So My question is how to do it using regular expression

You can use a lookahead and a backreference to solve this. But note that right now you are requiring at least 2 characters. The starting letter and another one (due to the +). You probably want to make that + and * so that the second character class can be repeated 0 or more times:
^(?!.*(.)\1)[a-zA-Z][a-zA-Z\d._-]*$
How does the lookahead work? Firstly, it's a negative lookahead. If the pattern inside finds a match, the lookahead causes the entire pattern to fail and vice-versa. So we can have a pattern inside that matches if we do have two consecutive characters. First, we look for an arbitrary position in the string (.*), then we match single (arbitrary) character (.) and capture it with the parentheses. Hence, that one character goes into capturing group 1. And then we require this capturing group to be followed by itself (referencing it with \1). So the inner pattern will try at every single position in the string (due to backtracking) whether there is a character that is followed by itself. If these two consecutive characters are found, the pattern will fail. If they cannot be found, the engine jumps back to where the lookahead started (the beginning of the string) and continue with matching the actual pattern.
Alternatively you can split this up into two separate checks. One for valid characters and the starting letter:
^[a-zA-Z][a-zA-Z\d._-]*$
And one for the consecutive characters (where you can invert the match result):
(.)\1
This would greatly increase the readability of your code (because it's less obscure than that lookahead) and it would also allow you to detect the actual problem in pattern and return an appropriate and helpful error message.

Related

Underscore in regex not validating

How do I add underscore as a part of my regex string.
Here is my string that checks for uppercase, lowercase, numbers and special characters. The rest of the special characters work. Validation isn't working for underscores.
#"^[^\s](?=(.*[A-Za-z]){1,})(?=(.*[\d]){1,})(?=(.*[\W]){1,})(?=(.*[!##$%^&*()-+=\[{\]};:<>|_.\\/?,\-`'""~]{1,})).*[^\s]$"
Any ideas?
Thanks
This is the regex that AWS Cogito uses, it should apply to your situation:
#"^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*[\^$*.\[\]{}\(\)?\-“!##%&\/,><’:;|_~`])\S{8,99}$"
You can check regexes at http://regexstorm.net, it's faster than building your application everytime.
I've approached it like this: I took your requirements and made them into separate positive lookaheads:
Check for:
uppercase (?=.*[A-Z])
lowercase (?=.*[a-z]) (note that I broke A-Z and a-z up into separate groups)
numbers (?=.*\d)
special characters (?=.*[!##$%^&*()-+=\[{\]};:<>|_.\\/?,\-`'""~])
You can then combine them in any order and I've combined them in the same order as I listed them above and anchored it with the beginning of the line using ^. Don't add any extra matches before, in-between or after the groups in your requirement that could cause the regex to enforce a certain ordering of the groups:
The lookahead for any non-word character \W makes it impossible to match Underscore1_ since it will only match on "anything other than a letter, digit or underscore" - which is all Underscore1_ contains.
The starting [^\s] (and ending [^\s]) that consumes one character is likely destroying a lot of good matches. Underscore1_ or _1scoreUnder shouldn't matter, but if you start with _ and consume it with [^\s] like you do, the later lookahead for a special character will fail (unless you have a second special character in the password).
#"^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[!##$%^&*()-+=\[{\]};:<>|_.\\/?,\-`'""~])"
If you have a minimum length requirement of, say, 7 characters, you just have to add .{7,}$ to the end of the regex, making it:
#"^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[!##$%^&*()-+=\[{\]};:<>|_.\\/?,\-`'""~]).{7,}$"
Without a minimum length, a password of one character from each group will be enough, and since there are 4 groups, a password with only 4 characters will pass the filter.
I see no point in putting an upper length limit into the regex. If the user interface has accepted a string that is thousands of characters long, then why reject it for being too long later? The length of what you store is probably going to be much smaller anyway since you'll be storing the bcrypt/scrypt/argon2/... encoded password.
Suggestion: Also add space (or even whitespaces) to the list of special characters.
In you regexp add underscore in 3rd Capturing Group regex101
#"^[^\s](?=(.*[A-Za-z]){1,})(?=(.*[\d]){1,})(?=(.*[\W_]){1,})(?=(.*[!##$%^&*()-+=\[{\]};:<>|_.\\/?,\-`'""~]{1,})).*[^\s]$"

C# Regex boundary with special characters

I want to have a Regex that finds "Attributable".
I tried #"\bAttributable\b" but the \b boundary doesn't work with special characters.
For example, it wouldn't differentiate Attributable and Non-Attributable. Is there any way to Regex for Attributable and not it's negative?
Do a negative look-behind?
(?<!-)\bAttributable\b
Obviously this only checks for -s. If you want to check for other characters, put them in a character class in the negative look-behind:
(?<![-^])\bAttributable\b
Alternatively, if you just want to not match Non-Attributable but do match SomethingElse-Attributable, then put Non- in the look-behind:
(?<!Non-)\bAttributable\b
There are several ways to fix the issue like you have but it all depends on the real requirements. It is sometimes necessary to precise what "word boundary" you need in each concrete case, since \b word boundary is 1) context dependent, and 2) matches specific places in the string that you should be aware of:
Before the first character in the string, if the first character is a
word character.
After the last character in the string, if the last
character is a word character.
Between two characters in the string,
where one is a word character and the other is not a word character.
Now, here are several approaches that you may follow:
When you only care about compound words usually joined with hyphens (similar #Sweeper's answer): (?<!-)\bAttributable\b(?!-)
Only match between whitespaces or start/end of string: (?<!\S)Attributable(?!\S). NOTE: Actually, if it is what you want, you may do without a regex by using s.Split().Contains("Attributable")
Only match if not preceded with punctuation and there is no letter/digit/underscore right after: (?<!\p{P})Attributable\b
Only match if not preceded with punctation symbols but some specific ones (say, you want to match the word after a comma and a colon): (?<![^\P{P},;])Attributable\b.

C# Regular Expression: Search the first 3 letters of each name

Does anyone know how to say I can get a regex (C#) search of the first 3 letters of a full name?
Without the use of (.*)
I used (.**)but it scrolls the text far beyond the requested name, or
if it finds the first condition and after 100 words find the second condition he return a text that is not the look, so I have to limit in number of words.
Example: \s*(?:\s+\S+){0,2}\s*
I would like to ignore names with less than 3 characters if they exist in name.
Search any name that contains the first 3 characters that start with:
'Mar Jac Rey' (regex that performs search)
Should match:
Marck Jacobs L. S. Reynolds
Marcus Jacobine Reys
Maroon Jacqueline by Reyils
Can anyone help me?
The zero or more quantifier (*) is 'greedy' by default—that is, it will consume as many characters as possible in order to finding the remainder of the pattern. This is why Mar.*Jac will match the first Mar in the input and the last Jac and everything in between.
One potential solution is just to make your pattern 'non-greedy' (*?). This will make it consume as few characters as possible in order to match the remainder of the pattern.
Mar.*?Jac.*?Rey
However, this is not a great solution because it would still match the various name parts regardless of what other text appears in between—e.g. Marcus Jacobine Should Not Match Reys would be a valid match.
To allow only whitespace or at most 2 consecutive non-whitespace characters to appear between each name part, you'd have to get more fancy:
\bMar\w*(\s+\S{0,2})*\s+Jac\w*(\s+\S{0,2})*\s+Rey\w*
The pattern (\s+\S{0,2})*\s+ will match any number of non-whitespace characters containing at most two characters, each surrounded by whitespace. The \w* after each name part ensures that the entire name is included in that part of the match (you might want to use \S* instead here, but that's not entirely clear from your question). And I threw in a word boundary (\b) at the beginning to ensure that the match does not start in the middle of a 'word' (e.g. OMar would not match).
I think what you want is this regular expression to check if it is true and is case insensitive
#"^[Mar|Jac|Rey]{3}"
Less specific:
#"^[\w]{3}"
If you want to capture the first three letters of every words of at least three characters words you could use something like :
((?<name>[\w]{3})\w+)+
And enable ExplicitCapture when initializing your Regex.
It will return you a serie of Match named "name", each one of them is a result.
Code sample :
Regex regex = new Regex(#"((?<name>[\w]{3})\w+)+", RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase);
var match = regex.Matches("Marck Jacobs L. S. Reynolds");
If you want capture also 3 characters words, you can replace the last "\w" by a space. In this case think to handle the last word of the phrase.

Is my C# Reg-ex correct?

Is this Regex correct if I have to match a string which is atleast 7 characters long, not more than 20 characters, has atleast 1 number, and atleast 1 letter? It has no other constraints.
[0-9]+[A-Za-z]+{7,20}
Thanks
No, it's not. The quantifier {7,20} doesn't apply to a token (repetition in regexes is done with quantifiers, like *, +, ? or the more general {n,m} – you cannot use more than one quantifier on a single token [in this case [a-zA-Z]]; *? is a quantifier on its own and thus doesn't play by above rules). You'll need something like the following:
^(?=.*\d)(?=.*[a-zA-Z]).{7,20}$
This has two lookaheads making sure of at least one digit and at least one letter:
(?=.*\d)
(?=.*[a-zA-Z])
Lookarounds are zero-width assertions; they do not consume characters in the string so they are merely matching a position. But they make sure that the expression inside of them would match at the current point. In this case this expression would match arbitrarily many characters and then would require a digit or a letter, respectively.
The actual match itself,
.{7,20}
just makes sure the length matches. What characters are used is irrelevant because we made sure of that constraints above already.
Finally the whole expression is anchored in that a start-of-string and end-of-string anchor are inserted at the start and end:
^...$
This makes sure that the match really encompasses the whole string. While not strictly necessary in this case (it would match the whole string anyway in all valid cases) it's often a good idea to include because usually regexes match only substrings and this can lead to subtle problems where validation regexes match even though they should fail. E.g. using \d+ to make sure a string consists only of digits would match the string a4b which puzzles beginners quite often.
I also changed that the order of letters and numbers doesn't matter. Your regex looks like it tries to impose a definite order where all numbers need to come before all letters which usually isn't what's wanted here.

.NET regex matching

Broadly: how do I match a word with regex rules for a)the beginning, b)the whole word, and c)the end?
More specifically: How do I match an expression of length >= 1 that has the following rules:
It cannot have any of: ! # #
It cannot begin with a space or =
It cannot end with a space
I tried:
^[^\s=][^!##]*[^\s]$
But the ^[^\s=] matching moves past the first character in the word. Hence this also matches words that begin with '!' or '#' or '#' (eg: '#ab' or '#aa'). This also forces the word to have at least 2 characters (one beginning character that is not space or = -and- one non-space character in the end).
I got to:
^[^\s=(!##)]\1*$
for a regex matching the first two rules. But how do I match no trailing spaces in the word with allowing words of length 1?
Cameron's solution is both accurate and efficient (and should be used for any production code where speed needs to be optimized). The answer presented here is less efficient, but demonstrates a general approach for applying logic using regular expressions.
You can use multiple positive and negative lookahead regex assertions (all applied at one location in the target string - typically the beginning), to apply multiple logical constraints for a match. The commented regex below demonstrates how easy this is to do for this example case. You do need to understand how the regex engine actually matches (and doesn't match), to come up with the correct expressions, but its not hard once you get the hang of it.
foundMatch = Regex.IsMatch(subjectString, #"
# Match 'word' meeting multiple logical constraints.
^ # Anchor to start of string.
(?=[^!##]*$) # It cannot have any of: ! # #, AND
(?![ =]) # It cannot begin with a space or =, AND
(?!.*\S$) # It cannot end with a space, AND
.{1,} # length >= 1 (ok to match special 'word')
\z # Anchor to end of string.
",
RegexOptions.IgnorePatternWhitespace);
This application of "regex-logic" is frequently used for complex password validation.
Your first attempt was very close. You only need to exclude more characters for the first and last parts, and make the last two parts optional:
^[^\s=!##](?:[^!##]*[^\s!##])?$
This ensures that all three sections will not include any of !##. Then, if the word is more than one character long, it will need to end with a not-space, with only select characters filling the space in-between. This is all enforced properly because of the ^ and $ anchors.
I'm not quite sure what your second example matched, since the () should be taken as literal characters when embedded within a character class, not as a capturing group.

Categories