C# Regex to detect usage of Special characters - c#

I want to filter out special characters in C#.
Basically i want to allow A-Z, a-z, 0-9, hyphen, underscore, (, ), commas, spaces, \, /, spaces.
Everything else is not allowed.
I have come up with the following regex ->
[a-zA-Z0-9-\b/(),_\s]*
but this doesn't seem to work fine.
Am I missing something?

If you want to filter out characters that don't match those, use a ^ at the beginning of the character class:
[^a-zA-Z0-9\-\\/(),_\s]+
The + quantifier will match any chars not in the character class at least once. Also, hyphens are meta characters inside character classes, so you should escape the dangling one you have, as I have done in my example. Also, if you want to include \ as an allowed character, you also need to escape it inside a character class, like [\\].
Also, inside a character class (also known as a character set defined by [ ]), \b is a backspace character, not a word boundary.

^[a-zA-Z0-9\-_(),\s\\/]+$
it's for the whole line

Related

What kind of regex can allow special characters but refuse strings that only use special characters?

I am writing an regex for acceptable first names and last names. Currently I only want to allow the following
a to z
A to Z
'
(-) (a dash symbol)
diacritics
My regex is #"^[a-zA-Z-'\p{L}]*$"
Although I want to allow apostrophes and dashes, I also don't want the name to be just a dash, or just an apostrophe. So to do this, I've written some extra regexes in Fluent Validator to catch these edge cases but it won't let me split them up.
.Matches(#"^[a-zA-Z-'\p{L}]*$")
.Matches(#"[^-]")
.Matches(#"[^']");
This also isn't that great since I also don't want to allow names that are just apostrophes like '''''' or just dashes like ---------.
Is there a more effective regex that can be written to handle all of these cases?
You can use a negative lookahead assertion for this:
#"^(?![-']*$)[-'\p{L}]*$"
Also, a-zA-Z are included in \p{L}.
Explanation:
^ # Start of string
(?! # Assert that it's impossible to match...
[-']* # a string of only dashes and apostrophes
$ # that extends to the end of the entire string.
) # End of lookahead.
[-'\p{L}]* # Match the allowed characters.
$ # End of string.

Regular expression for any characters without spaces

What is the regular expression for all characters except white space , and minimum6 characters.
This is what I have now :
^[\w'?#&#.]{6,}$
But this does not accept all the special characters. And I am using in .net app if that makes any difference
[^\s]{6,}$ should make it. But note the answer above, if you only want to skip the white spaces, you better use [^ ]. The notation [^\s] will ignore any white space character (space, tab or newline).
A .NET regex to match any string that does not contain any whitespace chars (at least 6 occurrences) is
\A\S{6,}\z
See the regex demo online
Do not use $ because it may match before a final \n (LF symbol) inside a string, \z is the most appropriate anchor here as it matches the very end of the string. To make the string compatible with JS (if you use it in ASP.NET for both server and client side validation) you need to use ^\S{6,}$(?!\n).
The \S shorthand character class matches any character other than a Unicode whitespace char (if ECMAScript option is not used).
The {6,} limiting quantifier matches six or more occurrences of the quantified subpattern.
Details
\A - (an unambiguous anchor, its behavior cannot be redefined with any regex options) start of a string
\S{6,} - any 6 or more chars other than a Unicode whitespace char
\z - the very end of the string.

C# Regex boundary with special characters

I want to have a Regex that finds "Attributable".
I tried #"\bAttributable\b" but the \b boundary doesn't work with special characters.
For example, it wouldn't differentiate Attributable and Non-Attributable. Is there any way to Regex for Attributable and not it's negative?
Do a negative look-behind?
(?<!-)\bAttributable\b
Obviously this only checks for -s. If you want to check for other characters, put them in a character class in the negative look-behind:
(?<![-^])\bAttributable\b
Alternatively, if you just want to not match Non-Attributable but do match SomethingElse-Attributable, then put Non- in the look-behind:
(?<!Non-)\bAttributable\b
There are several ways to fix the issue like you have but it all depends on the real requirements. It is sometimes necessary to precise what "word boundary" you need in each concrete case, since \b word boundary is 1) context dependent, and 2) matches specific places in the string that you should be aware of:
Before the first character in the string, if the first character is a
word character.
After the last character in the string, if the last
character is a word character.
Between two characters in the string,
where one is a word character and the other is not a word character.
Now, here are several approaches that you may follow:
When you only care about compound words usually joined with hyphens (similar #Sweeper's answer): (?<!-)\bAttributable\b(?!-)
Only match between whitespaces or start/end of string: (?<!\S)Attributable(?!\S). NOTE: Actually, if it is what you want, you may do without a regex by using s.Split().Contains("Attributable")
Only match if not preceded with punctuation and there is no letter/digit/underscore right after: (?<!\p{P})Attributable\b
Only match if not preceded with punctation symbols but some specific ones (say, you want to match the word after a comma and a colon): (?<![^\P{P},;])Attributable\b.

Ignore spaces at the end of a string

I use the following regex, which is working, but I want to add a condition so as to accept spaces at the end of the value. Currently it is not working.
What am I missinghere?
^[a-zA-Z][a-zA-Z0-9_]+\s?$[\s]*$
Assumption: you added the two end of string anchors $ by mistake.
? quantifier, matching one or zero repetitions, makes the previous item optional
* quantifier, matching zero or more repetitions
So change your expression to
^[a-zA-Z][a-zA-Z0-9_]+\s*$
this is matching any amount of whitespace at the end of the string.
Be aware, whitespace is not just the space character, it is also tabs and newlines (and more)!
If you really want to match only space, just write a space or make a character class with all the characters you want to match.
^[a-zA-Z][a-zA-Z0-9_]+ *$
or
^[a-zA-Z][a-zA-Z0-9_]+[ \t]*$
Next thing is: Are you sure you only want plain ASCII letters? Today there is Unicode and you can use Unicode properties, scripts and blocks in your regular expressions.
Your expression in Unicode, allowing all letters and digits.
^\p{L}\w+\s*$
\p{L} Unicode property, any kind of letter from any language.
\w shorthand character class for word characters (letters, digits and connector characters like "_") [\p{L}\p{Nd}\p{Pc}] as character class with Unicode properties. Definition on msdn
why two dollars?
^[a-zA-Z][a-zA-Z0-9_]+\s*$
or make it this :
"^[a-zA-Z][a-zA-Z0-9_]+\s?\$\s*$"
if you want to literally match the dollar.
Try this -
"^[a-zA-Z][a-zA-Z0-9_]+(\s)?$"
or this -
"^[a-zA-Z][a-zA-Z0-9_]+((\s){,})$"
$ indicates end of expression, if you are looking $ as character, then escape it with \

Escaping hash and quote to regular expression

I am trying to define a regular to use with a regular expression validator that limits the content of a textbox to only alphanumeric characters, slash (/), hash (#), left and right parentheses (()), period (.), apostrophe ('), quote ("), hyphen (-) and spaces.
I am having troubles with the hash and quote, the other restrictions are working, but when I insert one of these chars the evaluation fails and I get the error message. I have tried to escape these characters without and also using verbatim which was my last attempt.
#"[ a-zA-ZÀ-ÿ/().\'-""#]"
Any thoughts on these? Thank you
The regex language is smart enough to understand that periods and parentheses within a character class actually refer to the characters and not to the patterns they usually do when they appear outside of character classes.
Within your character class, you need to escape the slash (\) and the hyphen(-), but that's it:
#"[ a-zA-ZÀ-ÿ/().\\'\-""#]"
If you move your hyphen to the end of the character class, you won't even need to escape that:
#"[ a-zA-ZÀ-ÿ/().\\'""#-]"
And of course this still only matches one a single character. If you want to ensure that the entire string consists only of these characters, you'll need to use start (^) and end ($) anchors and a quantifier (* or +) after your character class.
I believe your final pattern should look like this:
#"^[ a-zA-ZÀ-ÿ/().\\'""#-]*$"

Categories