why do these regex tests let certain characters pass?

why do these regex tests let certain characters pass? - c#

I am checking a string with the following regexes:
[a-zA-Z0-9]+
[A-Za-z]+
For some reason, the characters:
.
-
_
are allowed to pass, why is that?

If you want to check that the complete string consists of only the wanted characters you need to anchor your regex like follows:
^[a-zA-Z0-9]+$
Otherwise every string will pass that contains a string of the allowed characters somewhere. The anchors essentially tell the regular expression engine to start looking for those characters at the start of the string and stop looking at the end of the string.
To clarify: If you just use [a-zA-Z0-9]+ as your regex, then the regex engine would rightfully reject the string -__-- as the regex doesn't match against that. There is no single character from the character class you defined.
However, with the string a-b it's different. The regular expression engine will match the first a here since that matches the expression you entered (at least one of the given characters) and won't care about the - or the b. It has done its job and successfully matched a substring according to your regular expression.
Similarly with _-abcdef- – the regex will match the substring abcdef just fine, because you didn't tell it to match only at the start or end of the string; and ignore the other characters.
So when using ^[a-zA-Z0-9]+$ as your regex you are telling the regex engine definitely that you are looking for one or more letters or digits, starting at the very beginning of the string right until the end of the string. There is no room for other characters to squeeze in or hide so this will do what you apparently want. But without the anchors, the match can be anywhere in your search string. For validation purposes you always want to use those anchors.

In regular expressions the + tells the engine to match one or more characters.
So this expression [A-Za-z]+ passes if the string contains a sequence of 1 or more alphabetic characters. The only strings that wouldn't pass are strings that contain no alphabetic characters at all.
The ^ symbol anchors the character class to the beginning of the string and the $ symbol anchors to the end of the string.
So ^[A-Za-z0-9]+ means 'match a string that begins with a sequence of one or more alphanumeric characters'. But would allow strings that include non-alphanumerics so long as those characters were not at the beginning of the string.
While ^[A-Za-z0-9]+$ means 'match a string that begins and ends with a sequence of one or more alphanumeric characters'. This is the only way to completely exclude non-alphanumerics from a string.

Related

Regular expression in RegularExpressionAttribute behavior

I am using this regular expression: #"[ \]\[;\/\\\?:*""<>|+=]|^[.]|[.]$"
First part [ \]\[;\/\\\?:*""<>|+=] should match any of the characters inside the brackets.
Next part ^[.] should match if the string starts with a 'dot'
Last part [.]$ should match if the string ends with a 'dot'
This works perfectly fine if I use Regex.IsMatch() function. However if I use RegularExpressionAttribute in ASP.NET MVC, I always get invalid model. Does anyone have any clue why this behavior occurs?
Examples:
"abcdefg" should not match
".abcdefg" should match
"abc.defg" should not match
"abcdefg." should match
"abc[defg" should match
Thanks in advance!
EDIT:
The RegularExpressionAttribute Specifies that a data field value in ASP.NET Dynamic Data must match the specified regular expression..
Which means. I need the "abcdef" to match, and ".abcdefg" to not match. Basically negate the whole expression I have above.

You need to make sure the pattern matches the entire string.
In a general case, you may append/prepend the pattern with .*.
Here, you may use
.*[ \][;/\\?:*"<>|+=].*|^[.].*|.*[.]$
Or, to make it a bit more efficient (that is, to reduce backtracking in the first branch) a negated character class will perform better:
[^ \][;/\\?:*"<>|+=]*[ \][;\/\\?:*"<>|+=].*|^[.].*|.*[.]$
But it is best to put the branches matching text at the start/end of the string as first branches:
^[.].*|.*[.]$|[^ \][;/\\?:*"<>|+=]*[ \][;/\\?:*"<>|+=].*
NOTE: You do not have to escape / and ? chars inside the .NET regex since you can't use regex delimiters there.
C# declaration of the last pattern will look like
#"^[.].*|.*[.]$|[^ \][;/\\?:*""<>|+=]*[ \][;/\\?:*""<>|+=].*"
See this .NET regex demo.
RegularExpressionAttrubute:
[RegularExpression(
#"^[.].*|.*[.]$|[^ \][;/\\?:*""<>|+=]*[ \][;/\\?:*""<>|+=].*",
ErrorMessage = "Username cannot contain following characters: ] [ ; / \\ ? : * \" < > | + =")
]

Your regex is an alternation which matches 1 character out of 3 character classes, the first consisting of more than 1 characters, the second a dot at the start of the string and the third a dot at the end of the string.
It works fine because it does match one of the alternations, only not the whole string you want to match.
You could use 3 alternations where the first matches a dot followed by repeating the character class until the end of the string, the second the other way around but this time the dot is at the end of the string.
Or the third using a positive lookahead asserting that the string contains at least one of the characters [\][;\/\\?:*"<>|+=]
^\.[a-z \][;\/\\?:*"<>|+=]+$|^[a-z \][;\/\\?:*"<>|+=]+\.$|^(?=.*[\][;\/\\?:*"<>|+=])[a-z \][;\/\\?:*"<>|+=]+$
Regex demo

Ignore spaces at the end of a string

I use the following regex, which is working, but I want to add a condition so as to accept spaces at the end of the value. Currently it is not working.
What am I missinghere?
^[a-zA-Z][a-zA-Z0-9_]+\s?$[\s]*$

Assumption: you added the two end of string anchors $ by mistake.
? quantifier, matching one or zero repetitions, makes the previous item optional
* quantifier, matching zero or more repetitions
So change your expression to
^[a-zA-Z][a-zA-Z0-9_]+\s*$
this is matching any amount of whitespace at the end of the string.
Be aware, whitespace is not just the space character, it is also tabs and newlines (and more)!
If you really want to match only space, just write a space or make a character class with all the characters you want to match.
^[a-zA-Z][a-zA-Z0-9_]+ *$
or
^[a-zA-Z][a-zA-Z0-9_]+[ \t]*$
Next thing is: Are you sure you only want plain ASCII letters? Today there is Unicode and you can use Unicode properties, scripts and blocks in your regular expressions.
Your expression in Unicode, allowing all letters and digits.
^\p{L}\w+\s*$
\p{L} Unicode property, any kind of letter from any language.
\w shorthand character class for word characters (letters, digits and connector characters like "_") [\p{L}\p{Nd}\p{Pc}] as character class with Unicode properties. Definition on msdn

why two dollars?
^[a-zA-Z][a-zA-Z0-9_]+\s*$
or make it this :
"^[a-zA-Z][a-zA-Z0-9_]+\s?\$\s*$"
if you want to literally match the dollar.

Try this -
"^[a-zA-Z][a-zA-Z0-9_]+(\s)?$"
or this -
"^[a-zA-Z][a-zA-Z0-9_]+((\s){,})$"
$ indicates end of expression, if you are looking $ as character, then escape it with \

Validate Unicode Length With Regex

How can I validate ۱۳۹۱/۰۹/۰۹ string with Regex
I want the length of each separate slash be exact as {4}/{2}/{2}
the Unicode range is [\u06F0-\u06F9].
I have problem with length checking.

You can use the following regular expression:
"^[\u06F0-\u06F9]{4}/[\u06F0-\u06F9]{2}/[\u06F0-\u06F9]{2}$"
You're probably missing the ^ to make it start the match at the beginning of the string and the $ to make it end the match at the end of the string. Without these changes strings that were longer, but that contained your expression would yield as a match.
With this change a match is only successful if the string contains your pattern and does not have any extra characters to the left or to the right of the target pattern.

This regex should work for you:
"(^|[^\u06F0-\u06F9]{1})[\u06F0-\u06F9]{4}/[\u06F0-\u06F9]{2}/[\u06F0-\u06F9]{2}([^\u06F0-\u06F9]{1}|$)"
Match the date expression under both of the following conditions:
Condition1: It should be either at the beginning of the string or after a single character that's not in the character range [\u06F0-\u06F9]
Condition2: It should be either at the end of the string or before a single character that's not in the character range [\u06F0-\u06F9]
This will not match the expression in this string:
How can I validate ۱۱۳۹۱/۰۹/۰۹ string with Regex
-------------------^5Numbers, not matched
Or this string:
How can I validate ۱۱۳۹۱/۰۹/۰۹۹ string with Regex
------------------------------^Three numbers, not matched
but still will match the date expression in this string:
How can I validate۱۳۹۱/۰۹/۰۹string with Regex
------------------^---------^ No whitespaces above ^, the expression is matched though
If you want to avoid this, i.e, just match the date expression alone, with whitespaces (and linebreaks) before and after it, use the following Regex:
(^|[ \t\n]{1})[\u06F0-\u06F9]{4}/[\u06F0-\u06F9]{2}/[\u06F0-\u06F9]{2}([ \t\n]{1}|$)
Hope that's helpful.

Nothing else but Regex for matching the string

I want to check whether there is string starting from number and then optional character with the help of the regex.So what should be the regex for matching the string which must be started with number and then character might be there or not.Like there is string "30a" or "30" it should be matched.But if there is "a" or some else character or sereis of characters, string should not be matched.

Sounds like there should be able to be any number of numeric characters at the beginning followed by optional other characters. To match any other character after a series of numbers at the beginning I would use:
\d+.*
To match only alpha numeric characters after the mandatory numeric beginning I would use:
\d+\w*
Note: as pointed out by Dav, if you add a ^ to the start of the expression and a $ to the end of the expression like this ^\d+\w*$ you will ensure the whole string matches. However if you leave those off, you will be able to search the input string for what you need. It just depends on what your needs are.

^\d.*
The ^ matches the start of the string, \d matches a single digit, and then the .* matches any number of additional characters.
Thus, the net result is that it will only match if the string begins with a digit.

How to check for special characters using regex

NET. I have created a regex validator to check for special characters means I donot want any special characters in username. The following is the code
Regex objAlphaPattern = new Regex(#"[a-zA-Z0-9_#.-]");
bool sts = objAlphaPattern.IsMatch(username);
If I provide username as $%^&asghf then the validator gives as invalid data format which is the result I want but If I provide a data s23_#.-^&()%^$# then my validator should block the data but my validator allows the data which is wrong
So how to not allow any special characters except a-z A-A 0-9 _ # .-
Thanks
Sunil Kumar Sahoo

There's a few things wrong with your expression. First you don't have the start string character ^ and end string character $ at the beginning and end of your expression meaning that it only has to find a match somewhere within your string.
Second, you're only looking for one character currently. To force a match of all the characters you'll need to use * Here's what it should be:
Regex objAlphaPattern = new Regex(#"^[a-zA-Z0-9_#.-]*$");
bool sts = objAlphaPattern.IsMatch(username);

Your pattern checks only if the given string contains any "non-special" character; it does not exclude the unwanted characters. You want to change two things; make it check that the whole string contains only allowed characters, and also make it check for more than one character:
^[a-zA-Z0-9_#.-]+$
Added ^ before the pattern to make it start matching at the beginning of the string. Also added +$ after, + to ensure that there is at least one character in the string, and $ to make sure that the string is matched to the end.

Change your regex to ^[a-zA-Z0-9_#.-]+$. Here ^ denotes the beginning of a string, $ is the end of the string.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

why do these regex tests let certain characters pass? - c#

I am checking a string with the following regexes: [a-zA-Z0-9]+ [A-Za-z]+ For some reason, the characters: . - _ are allowed to pass, why is that?

Related

Regular expression in RegularExpressionAttribute behavior

Ignore spaces at the end of a string

Validate Unicode Length With Regex

Nothing else but Regex for matching the string

How to check for special characters using regex

Categories

Resources