I use the following regex, which is working, but I want to add a condition so as to accept spaces at the end of the value. Currently it is not working.
What am I missinghere?
^[a-zA-Z][a-zA-Z0-9_]+\s?$[\s]*$
Assumption: you added the two end of string anchors $ by mistake.
? quantifier, matching one or zero repetitions, makes the previous item optional
* quantifier, matching zero or more repetitions
So change your expression to
^[a-zA-Z][a-zA-Z0-9_]+\s*$
this is matching any amount of whitespace at the end of the string.
Be aware, whitespace is not just the space character, it is also tabs and newlines (and more)!
If you really want to match only space, just write a space or make a character class with all the characters you want to match.
^[a-zA-Z][a-zA-Z0-9_]+ *$
or
^[a-zA-Z][a-zA-Z0-9_]+[ \t]*$
Next thing is: Are you sure you only want plain ASCII letters? Today there is Unicode and you can use Unicode properties, scripts and blocks in your regular expressions.
Your expression in Unicode, allowing all letters and digits.
^\p{L}\w+\s*$
\p{L} Unicode property, any kind of letter from any language.
\w shorthand character class for word characters (letters, digits and connector characters like "_") [\p{L}\p{Nd}\p{Pc}] as character class with Unicode properties. Definition on msdn
why two dollars?
^[a-zA-Z][a-zA-Z0-9_]+\s*$
or make it this :
"^[a-zA-Z][a-zA-Z0-9_]+\s?\$\s*$"
if you want to literally match the dollar.
Try this -
"^[a-zA-Z][a-zA-Z0-9_]+(\s)?$"
or this -
"^[a-zA-Z][a-zA-Z0-9_]+((\s){,})$"
$ indicates end of expression, if you are looking $ as character, then escape it with \
Related
What is the regular expression for all characters except white space , and minimum6 characters.
This is what I have now :
^[\w'?#&#.]{6,}$
But this does not accept all the special characters. And I am using in .net app if that makes any difference
[^\s]{6,}$ should make it. But note the answer above, if you only want to skip the white spaces, you better use [^ ]. The notation [^\s] will ignore any white space character (space, tab or newline).
A .NET regex to match any string that does not contain any whitespace chars (at least 6 occurrences) is
\A\S{6,}\z
See the regex demo online
Do not use $ because it may match before a final \n (LF symbol) inside a string, \z is the most appropriate anchor here as it matches the very end of the string. To make the string compatible with JS (if you use it in ASP.NET for both server and client side validation) you need to use ^\S{6,}$(?!\n).
The \S shorthand character class matches any character other than a Unicode whitespace char (if ECMAScript option is not used).
The {6,} limiting quantifier matches six or more occurrences of the quantified subpattern.
Details
\A - (an unambiguous anchor, its behavior cannot be redefined with any regex options) start of a string
\S{6,} - any 6 or more chars other than a Unicode whitespace char
\z - the very end of the string.
I want to have a Regex that finds "Attributable".
I tried #"\bAttributable\b" but the \b boundary doesn't work with special characters.
For example, it wouldn't differentiate Attributable and Non-Attributable. Is there any way to Regex for Attributable and not it's negative?
Do a negative look-behind?
(?<!-)\bAttributable\b
Obviously this only checks for -s. If you want to check for other characters, put them in a character class in the negative look-behind:
(?<![-^])\bAttributable\b
Alternatively, if you just want to not match Non-Attributable but do match SomethingElse-Attributable, then put Non- in the look-behind:
(?<!Non-)\bAttributable\b
There are several ways to fix the issue like you have but it all depends on the real requirements. It is sometimes necessary to precise what "word boundary" you need in each concrete case, since \b word boundary is 1) context dependent, and 2) matches specific places in the string that you should be aware of:
Before the first character in the string, if the first character is a
word character.
After the last character in the string, if the last
character is a word character.
Between two characters in the string,
where one is a word character and the other is not a word character.
Now, here are several approaches that you may follow:
When you only care about compound words usually joined with hyphens (similar #Sweeper's answer): (?<!-)\bAttributable\b(?!-)
Only match between whitespaces or start/end of string: (?<!\S)Attributable(?!\S). NOTE: Actually, if it is what you want, you may do without a regex by using s.Split().Contains("Attributable")
Only match if not preceded with punctuation and there is no letter/digit/underscore right after: (?<!\p{P})Attributable\b
Only match if not preceded with punctation symbols but some specific ones (say, you want to match the word after a comma and a colon): (?<![^\P{P},;])Attributable\b.
Can any one please explain the regex below, this has been used in my application for a very long time even before I joined, and I am very new to regex's.
/^.*(?=.{6,10})(?=.*[a-zA-Z].*[a-zA-Z].*[a-zA-Z].*[a-zA-Z])(?=.*\d.*\d).*$/
As far as I understand
this regex will validate
- for a minimum of 6 chars to a maximum of 10 characters
- will escape the characters like ^ and $
also, my basic need is that I want a regex for a minimum of 6 characters with 1 character being a digit and the other one being a special character.
^.*(?=.{6,10})(?=.*[a-zA-Z].*[a-zA-Z].*[a-zA-Z].*[a-zA-Z])(?=.*\d.*\d).*$
^ is called an "anchor". It basically means that any following text must be immediately after the "start of the input". So ^B would match "B" but not "AB" because in the second "B" is not the first character.
.* matches 0 or more characters - any character except a newline (by default). This is what's known as a greedy quantifier - the regex engine will match ("consume") all of the characters to the end of the input (or the end of the line) and then work backwards for the rest of the expression (it "gives up" characters only when it must). In a regex, once a character is "matched" no other part of the expression can "match" it again (except for zero-width lookarounds, which is coming next).
(?=.{6,10}) is a lookahead anchor and it matches a position in the input. It finds a place in the input where there are 6 to 10 characters following, but it does not "consume" those characters, meaning that the following expressions are free to match them.
(?=.*[a-zA-Z].*[a-zA-Z].*[a-zA-Z].*[a-zA-Z]) is another lookahead anchor. It matches a position in the input where the following text contains four letters ([a-zA-Z] matches one lowercase or uppercase letter), but any number of other characters (including zero characters) may be between them. For example: "++a5b---C#D" would match. Again, being an anchor, it does not actually "consume" the matched characters - it only finds a position in the text where the following characters match the expression.
(?=.*\d.*\d) Another lookahead. This matches a position where two numbers follow (with any number of other characters in between).
.* Already covered this one.
$ This is another kind of anchor that matches the end of the input (or the end of a line - the position just before a newline character). It says that the preceding expression must match characters at the end of the string. When ^ and $ are used together, it means that the entire input must be matched (not just part of it). So /bcd/ would match "abcde", but /^bcd$/ would not match "abcde" because "a" and "e" could not be included in the match.
NOTE
This looks like a password validation regex. If it is, please note that it's broken. The .* at the beginning and end will allow the password to be arbitrarily longer than 10 characters. It could also be rewritten to be a bit shorter. I believe the following will be an acceptable (and slightly more readable) substitute:
^(?=(.*[a-zA-Z]){4})(?=(.*\d){2}).{6,10}$
Thanks to #nhahtdh for pointing out the correct way to implement the character length limit.
Check Cyborgx37's answer for the syntax explanation. I'll do some explanation on the meaning of the regex.
^.*(?=.{6,10})(?=.*[a-zA-Z].*[a-zA-Z].*[a-zA-Z].*[a-zA-Z])(?=.*\d.*\d).*$
The first .* is redundant, since the rest are zero-width assertions that begins with any character ., and .* at the end.
The regex will match minimum 6 characters, due to the assertion (?=.{6,10}). However, there is no upper limit on the number of characters of the string that the regex can match. This is because of the .* at the end (the .* in the front also contributes).
This (?=.*[a-zA-Z].*[a-zA-Z].*[a-zA-Z].*[a-zA-Z]) part asserts that there are at least 4 English alphabet character (uppercase or lowercase). And (?=.*\d.*\d) asserts that there are at least 2 digits (0-9). Since [a-zA-Z] and \d are disjoint sets, these 2 conditions combined makes the (?=.{6,10}) redundant.
The syntax of .*[a-zA-Z].*[a-zA-Z].*[a-zA-Z].*[a-zA-Z] is also needlessly verbose. It can be shorten with the use of repetition: (?:.*[a-zA-Z]){4}.
The following regex is equivalent your original regex. However, I really doubt your current one and this equivalent rewrite of your regex does what you want:
^(?=(?:.*[a-zA-Z]){4})(?=(?:.*\d){2}).*$
More explicit on the length, since clarity is always better. Meaning stay the same:
^(?=(?:.*[a-zA-Z]){4})(?=(?:.*\d){2}).{6,}$
Recap:
Minimum length = 6
No limit on maximum length
At least 4 English alphabet, lowercase or uppercase
At least 2 digits 0-9
REGEXPLANATION
/.../: slashes are often used to represent the area where the regex is defined
^: matches beginning of input string
.: this can match any character
*: matches the previous symbol 0 or more times
.{6,10}: matches .(any character) somewhere between 6 and 10 times
[a-zA-Z]: matches all characters between a and z and between A and Z
\d: matches a digit.
$: matches the end of input.
I think that just about does it for all the symbols in the regex you've posted
For your regex request, here is what you would use:
^(?=.{6,}$)(?=.*?\d)(?=.*?[!##$%&*()+_=?\^-]).*
And here it is unrolled for you:
^ // Anchor the beginning of the string (password).
(?=.{6,}$) // Look ahead: Six or more characters, then the end of the string.
(?=.*?\d) // Look ahead: Anything, then a single digit.
(?=.*?[!##$%&*()+_=?\^-]) // Look ahead: Anything, and a special character.
.* // Passes our look aheads, let's consume the entire string.
As you can see, the special characters have to be explicitly defined as there is not a reserved shorthand notation (like \w, \s, \d) for them. Here are the accepted ones (you can modify as you wish):
!, #, #, $, %, ^, &, *, (, ), -, +, _, =, ?
The key to understanding regex look aheads is to remember that they do not move the position of the parser. Meaning that (?=...) will start looking at the first character after the last pattern match, as will subsequent (?=...) look aheads.
I want to filter out special characters in C#.
Basically i want to allow A-Z, a-z, 0-9, hyphen, underscore, (, ), commas, spaces, \, /, spaces.
Everything else is not allowed.
I have come up with the following regex ->
[a-zA-Z0-9-\b/(),_\s]*
but this doesn't seem to work fine.
Am I missing something?
If you want to filter out characters that don't match those, use a ^ at the beginning of the character class:
[^a-zA-Z0-9\-\\/(),_\s]+
The + quantifier will match any chars not in the character class at least once. Also, hyphens are meta characters inside character classes, so you should escape the dangling one you have, as I have done in my example. Also, if you want to include \ as an allowed character, you also need to escape it inside a character class, like [\\].
Also, inside a character class (also known as a character set defined by [ ]), \b is a backspace character, not a word boundary.
^[a-zA-Z0-9\-_(),\s\\/]+$
it's for the whole line
I am checking a string with the following regexes:
[a-zA-Z0-9]+
[A-Za-z]+
For some reason, the characters:
.
-
_
are allowed to pass, why is that?
If you want to check that the complete string consists of only the wanted characters you need to anchor your regex like follows:
^[a-zA-Z0-9]+$
Otherwise every string will pass that contains a string of the allowed characters somewhere. The anchors essentially tell the regular expression engine to start looking for those characters at the start of the string and stop looking at the end of the string.
To clarify: If you just use [a-zA-Z0-9]+ as your regex, then the regex engine would rightfully reject the string -__-- as the regex doesn't match against that. There is no single character from the character class you defined.
However, with the string a-b it's different. The regular expression engine will match the first a here since that matches the expression you entered (at least one of the given characters) and won't care about the - or the b. It has done its job and successfully matched a substring according to your regular expression.
Similarly with _-abcdef- – the regex will match the substring abcdef just fine, because you didn't tell it to match only at the start or end of the string; and ignore the other characters.
So when using ^[a-zA-Z0-9]+$ as your regex you are telling the regex engine definitely that you are looking for one or more letters or digits, starting at the very beginning of the string right until the end of the string. There is no room for other characters to squeeze in or hide so this will do what you apparently want. But without the anchors, the match can be anywhere in your search string. For validation purposes you always want to use those anchors.
In regular expressions the + tells the engine to match one or more characters.
So this expression [A-Za-z]+ passes if the string contains a sequence of 1 or more alphabetic characters. The only strings that wouldn't pass are strings that contain no alphabetic characters at all.
The ^ symbol anchors the character class to the beginning of the string and the $ symbol anchors to the end of the string.
So ^[A-Za-z0-9]+ means 'match a string that begins with a sequence of one or more alphanumeric characters'. But would allow strings that include non-alphanumerics so long as those characters were not at the beginning of the string.
While ^[A-Za-z0-9]+$ means 'match a string that begins and ends with a sequence of one or more alphanumeric characters'. This is the only way to completely exclude non-alphanumerics from a string.