How to check for special characters using regex - c#

NET. I have created a regex validator to check for special characters means I donot want any special characters in username. The following is the code
Regex objAlphaPattern = new Regex(#"[a-zA-Z0-9_#.-]");
bool sts = objAlphaPattern.IsMatch(username);
If I provide username as $%^&asghf then the validator gives as invalid data format which is the result I want but If I provide a data s23_#.-^&()%^$# then my validator should block the data but my validator allows the data which is wrong
So how to not allow any special characters except a-z A-A 0-9 _ # .-
Thanks
Sunil Kumar Sahoo

There's a few things wrong with your expression. First you don't have the start string character ^ and end string character $ at the beginning and end of your expression meaning that it only has to find a match somewhere within your string.
Second, you're only looking for one character currently. To force a match of all the characters you'll need to use * Here's what it should be:
Regex objAlphaPattern = new Regex(#"^[a-zA-Z0-9_#.-]*$");
bool sts = objAlphaPattern.IsMatch(username);

Your pattern checks only if the given string contains any "non-special" character; it does not exclude the unwanted characters. You want to change two things; make it check that the whole string contains only allowed characters, and also make it check for more than one character:
^[a-zA-Z0-9_#.-]+$
Added ^ before the pattern to make it start matching at the beginning of the string. Also added +$ after, + to ensure that there is at least one character in the string, and $ to make sure that the string is matched to the end.

Change your regex to ^[a-zA-Z0-9_#.-]+$. Here ^ denotes the beginning of a string, $ is the end of the string.

Related

Validate string to have only one occurrence of `-` or `.` special characters inside it

I have a scenario where I need to validate given string which only be allowed to have - or . special characters in the middle of the given string.
It cannot have - or . special characters at the beginning or end of the string.
either - or . are allowed in a given string, not both of them at same time (at most one occurrence allowed).
I used Regex to validate given string.
Regex regex = new Regex(#"(^[a-zA-Z]+[.|-]?([a-zA-Z]+)$)");
and validating string passing into IsMatch method
regex.IsMatch(givenString)
Above is the solution I came up with. Is there a better way to validate this scenario?
Thanks
Your current regex does not allow a single char input while it is possible to have a. You may fix it using a grouping construct with a ? quantifier set to it:
var regex = new Regex(#"^[a-zA-Z]+(?:[.-][a-zA-Z]+)?$");
// or, using the case insensitive modifier
var regex = new Regex(#"^[A-Z]+(?:[.-][A-Z]+)?$", RegexOptions.IgnoreCase);
Details
^ - start of string
[a-zA-Z]+ - 1 or more ASCII letters
(?:[.-][a-zA-Z]+)? - one or zero repetitions of . or - followed with 1 or more ASCII letters
$ - end of string (or \z in case you do not want to allow a match before a trailing \n).
See the regex demo.

How can i make my regular expression work?

I am new to both .NET (C#) and regular expressions.
I need a regular expression to match against a url:
If url string contains "/id/Whatever_COMES_HERE_EVERY_CHAR_ACCEPTED/" : return true
If url string contains only "/id/" : return false
I have tried the following but it only returns true if url is http:// localhost/id/
This is my script:
string thisUrl = HttpContext.Current.Request.Url.AbsolutePath;
Match match = Regex.Match(thisUrl, #"/id/*$");
What am i doing wrong?
You have this:
/id/*$
What this is doing is matching the literal string /id/ and then you have the quantifier * which means 0 or more times. Then you have $ which means end of the string.
You are looking for repetition of the literal / which is not what you want. (So this: http:// localhost/id/////////////////// should have matched too with your original regex)
What you need is something like this:
/id/.+$
This will match the literal /id/ followed by the . which in regex means any character which is quantified with the + which means 1 or more.
You could tighten it up and use \S instead of . which means non-whitespace characters (since a URL shouldn't have whitespace)
Also note: there are a variety of online regex tools which are really useful when trying to figure out and test a regex. A couple of examples:
http://rubular.com/
http://regex101.com/
http://www.regxlib.com/
And even extension for visual studio you can use:
https://visualstudiogallery.msdn.microsoft.com/bf883ae3-188b-43bc-bd29-6235c4195d1f
When you use the start it signals that 0 or more of the preceding char shall be present. You will want to use
"/id/.+" to signal that at least one more char must come after the /
If you're just looking for true/false solution, you should use IsMatch() function. The other issue is that * (zero or more) and + (one or more) are quantifiers and must be preceeded by a character class or group. Dot (.) is a character class that represents ANY character. So the correct solution for your problem would be:
Regex.IsMatch(thisUrl, #"/id/.+$");
Considering that the input is a URL, this regex can be improved upon by restricting character classes to valid URL characters only, but for your purpose the above should be sufficient.

Remove non-alphanumeric characters from start and end of string only

I am trying to clean up some data using a helper exe (C#).
I iterate through each string and I want to remove invalid characters from the start and end of the string i.e. remove the dollar symbols from $$$helloworld$$$.
This works fine using this regular expression: \W.
However, strings which contain invalid character in the middle should be left alone i.e. hello$$$$world is fine and my regular expression should not match this particular string.
So in essence, I am trying to figure out the syntax to match invalid characters at the start and the end of of a string, but leave the strings which contain invalid characters in their body.
Thanks for your help!
This does it!
(^[\W_]*)|([\W_]*$)
This regex says match zero or more non word characters at the start(^) or(|) at the end($)
The following should work:
^\W+|\W+$
^ and $ are anchors to the beginning and end of the string respectively. The | in the middle is an OR, so this regex means "either match one or more non-word characters at the start of the string, or match one or more non-word characters at the end of the string".
Use ^ to match the start of string, and $ to match the end of string. C# Regex Cheat Sheet
Try this one,
(^[^\w]*)|([^\w]*$)
Use ^ to match 'beginning of line' and $ to match 'end of line', i.e. you code should match and remove ^\W* and \W*$

Excluding certain patterns in a regex

I'm working on a Regex in C# to exclude certain patterns within a string.
These are the types patterns I want to accept are: "%00" (Hex 00-FF) and any other character without a starting '%'. The patterns I would like to exclude are: "%0" (Values with a starting % and one character after) and/or characters "&<>'/".
So far I have this
Regex correctStringRegex = new Regex(#"(%[0-9a-fA-F]{2})|[^%&<>'/]|(^(%.))",
RegexOptions.IgnoreCase);
Below are examples of what I'm trying to pass and reject.
Passing String %02This is%0A%0Da string%03
Reject String %0%0Z%A&<%0a%
If a string doesn't pass all the requirements I would like to reject the whole string completely.
Any Help would be greatly appreciated!
I suggest this:
^(?:%[0-9a-f]{2}|[^%&<>'/])*$
Explanation:
^ # Start of string
(?: # Match either
%[0-9a-f]{2} # %xx
| # or
[^%&<>'/] # any character except the forbidden ones
)* # any number of times
$ # until end of string.
This ensures that % is only matched when followed by two hexadecimals. Since you're already compiling the regex with the IgnoreCase flag set, you don't need a-fA-F, either.
Hmm, given the comments so far, I think you need a different problem definition. You want to pass or fail a string, using regex, based on whether or not the string contains any invalid patterns. Im assuming a string will fail if there is ANY invalid pattern, rather than the reverse of a string passing if there is any valid pattern.
As such, I would use this regex: %(?![0-9a-f]{2})|[&<>'/]
You would then run this in such a way that a string is invalid if you GET a match, a valid string will not have any matches in this set.
A quick explanation of a rather odd regex. The format (?!) tells the regex "Match the previous symbol if the symbols in this set DONT follow it" ie: Match if suffix not present. So, what im telling it to look for is any instance of % that is not followed by 2 hex characters, or any other invalid character. The assumption is that anything that DOESN'T match this regex is a valid character entry.

why do these regex tests let certain characters pass?

I am checking a string with the following regexes:
[a-zA-Z0-9]+
[A-Za-z]+
For some reason, the characters:
.
-
_
are allowed to pass, why is that?
If you want to check that the complete string consists of only the wanted characters you need to anchor your regex like follows:
^[a-zA-Z0-9]+$
Otherwise every string will pass that contains a string of the allowed characters somewhere. The anchors essentially tell the regular expression engine to start looking for those characters at the start of the string and stop looking at the end of the string.
To clarify: If you just use [a-zA-Z0-9]+ as your regex, then the regex engine would rightfully reject the string -__-- as the regex doesn't match against that. There is no single character from the character class you defined.
However, with the string a-b it's different. The regular expression engine will match the first a here since that matches the expression you entered (at least one of the given characters) and won't care about the - or the b. It has done its job and successfully matched a substring according to your regular expression.
Similarly with _-abcdef- – the regex will match the substring abcdef just fine, because you didn't tell it to match only at the start or end of the string; and ignore the other characters.
So when using ^[a-zA-Z0-9]+$ as your regex you are telling the regex engine definitely that you are looking for one or more letters or digits, starting at the very beginning of the string right until the end of the string. There is no room for other characters to squeeze in or hide so this will do what you apparently want. But without the anchors, the match can be anywhere in your search string. For validation purposes you always want to use those anchors.
In regular expressions the + tells the engine to match one or more characters.
So this expression [A-Za-z]+ passes if the string contains a sequence of 1 or more alphabetic characters. The only strings that wouldn't pass are strings that contain no alphabetic characters at all.
The ^ symbol anchors the character class to the beginning of the string and the $ symbol anchors to the end of the string.
So ^[A-Za-z0-9]+ means 'match a string that begins with a sequence of one or more alphanumeric characters'. But would allow strings that include non-alphanumerics so long as those characters were not at the beginning of the string.
While ^[A-Za-z0-9]+$ means 'match a string that begins and ends with a sequence of one or more alphanumeric characters'. This is the only way to completely exclude non-alphanumerics from a string.

Categories