Simple Regex help needed - c#

I need to create a regular expression to match a string that contains anything other than the specified characters. The characters are
a-z A-Z 0-9
+ - * / : . $ %
and a space
I'm not very familiar with regex so I'm unsure how to put it together and test it. I can find lots of cheat sheets but I don't know how to actually structure it as one whole pattern.

a ^ in a capture group character class negates those characters in the class. So:
[^a-zA-Z0-9+\-*/:.]
Some characters there are special chars in regex so they're escaped with \.

~^[^a-z0-9\+\-\*\/\:\.\$\%\x20]*$~i
Starting with ^ and ending with $ to make sure that string contains only allowed characters.
Character group is starting with ^ for negation. \x20 stands for space, to much any whitespace use \x20. This RegExp is case insensitive (i modifier). You may test your regular expressions here http://regex.larsolavtorvik.com/

Related

Ignore spaces at the end of a string

I use the following regex, which is working, but I want to add a condition so as to accept spaces at the end of the value. Currently it is not working.
What am I missinghere?
^[a-zA-Z][a-zA-Z0-9_]+\s?$[\s]*$
Assumption: you added the two end of string anchors $ by mistake.
? quantifier, matching one or zero repetitions, makes the previous item optional
* quantifier, matching zero or more repetitions
So change your expression to
^[a-zA-Z][a-zA-Z0-9_]+\s*$
this is matching any amount of whitespace at the end of the string.
Be aware, whitespace is not just the space character, it is also tabs and newlines (and more)!
If you really want to match only space, just write a space or make a character class with all the characters you want to match.
^[a-zA-Z][a-zA-Z0-9_]+ *$
or
^[a-zA-Z][a-zA-Z0-9_]+[ \t]*$
Next thing is: Are you sure you only want plain ASCII letters? Today there is Unicode and you can use Unicode properties, scripts and blocks in your regular expressions.
Your expression in Unicode, allowing all letters and digits.
^\p{L}\w+\s*$
\p{L} Unicode property, any kind of letter from any language.
\w shorthand character class for word characters (letters, digits and connector characters like "_") [\p{L}\p{Nd}\p{Pc}] as character class with Unicode properties. Definition on msdn
why two dollars?
^[a-zA-Z][a-zA-Z0-9_]+\s*$
or make it this :
"^[a-zA-Z][a-zA-Z0-9_]+\s?\$\s*$"
if you want to literally match the dollar.
Try this -
"^[a-zA-Z][a-zA-Z0-9_]+(\s)?$"
or this -
"^[a-zA-Z][a-zA-Z0-9_]+((\s){,})$"
$ indicates end of expression, if you are looking $ as character, then escape it with \

C# Regex method explains

new Regex(#"\n|\r|\\|<|>|\*|!|\$|%|;");
I have an regex example above, but I can not really understand what is trying to find? can anyone give me a hand please?
The regex matches one of the characters separated by the alternation operator |. There are a few special characters (like \n or \r for newline and carriage return, or \$ for a literal dollar sign and \* for a literal asterisk because $ and * are regex metacharacters), but other than that, it's quite straightforward.
That said, for matching a single character out of a list of valid characters, a character class is usually the better choice, not only because there is less need to escape the metacharacters:
new Regex(#"[\n\r\\<>*!$%;]");
It'll try to match any of the special character listed: \n, \r, \, <, >, *, !, $, % The | is the regex OR operator.
Some characters need to be escaped with an extra \ as they have a signification in the regex lanugage (\, $, ...)
| in regex is an alternation operator. A|B means match either A or B. It can also be written using a character class - [AB] which also means the same thing.
The benefit of using character class is, you don't need to escape regex meta-characters inside it, which you have to do outside, as you did for *. So, your regex can be shortened to:
new Regex(#"[\n\r\\<>*!$%;]");

Regular expressions to extract punctuations/characters

Hi guys I need a regex that only extracts punctuations/characters.
I have this so far :
[._^%$#!~#,-]+
but this works if there is at least 1 punctuations and still allows for any other char (digit or letter)
I need to to only allow punctuations/characters
Try this:
^[\p{S}\p{P}]+$
\p{S} matches any symbol character, and \p{P} matches any punctuation.
Note that your pattern will not match all the symbols and punctuations not present in the list.
Try anchoring the regex to the start and the end of the string (unless you're using Multiline matching) - i.e. ^ at the beginning and $ at the end:
^[._^%$#!~#,-]+$
Note - this does not endorse your actual pattern (I can't say whether this is matching all the 'special characters' you're talking about, but it will make it so that the entire string must be all 'special'.
[^a-zA-Z0-9]* u can try something like this. Should NOT accept those chars, cba to writte all beside chars beside one u typed.
\W
Matches any character that is not a word character (alphanumeric & underscore).

Regexp Remove any non alphanumeric, but leave some special characters in one expression

I have this code that replaces all non alphanumeric characters with "-" char.
return Regex.Replace(strIn, #"[\W|_]+", "-", RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase);
but I need to change it to allow pass some special characters (one or more) for example: #,*,%
how to change this regular expression?
Use
[^\p{L}\p{N}#*%]+
This matches one or more characters that are neither letters nor digits nor any of #, * or %.
Another option, you can use charcter class subtractioninfo, for example to remove # from the character class:
[\W_-[#]]+
Just add other accepted special chars after the #. Live example here: http://rextester.com/rundotnet?code=YFQ40277
How about this one:
[^a-zA-Z0-9#*%]+
If you are using unicode you can do (as Tim's answer):
[^\p{L}\p{N}#*%]+
Use this.
([^\w#*%]|_)
Add any other special characters after the %.
It is basically saying, match any character that is not (^) a word character(\w), #, * or % OR match _.
It seems this way is the best solution for you
#"(?!.*[^\w#*%])"
You can use set subtraction for that:
#"[\W_-[#*%]]+"
This matches the set of all non-word characters and the underscore, minus the set of #, * and %.
Note that you don't have to use | for "or" in a character class, since that's implied. In fact, the | in your regex just matches |.
Note also that in .NET, \w matches a few other "connector punctuation" characters besides the underscore. If you want to match the other characters too, you can use
#"[\W\p{Pc}-[#*%]]+"

Regular Expression: single word

I want to check in a C# program, if a user input is a single word. The word my only have characters A-Z and a-z. No spaces or other characters.
I try [A-Za-z]* , but this doesn't work. What is wrong with this expression?
Regex regex = new Regex("[A-Za-z]*");
if (!regex.IsMatch(userinput);)
{
...
}
Can you recomend website with a comprensiv list of regex examples?!
It probably works, but you aren't anchoring the regular expression. You need to use ^ and $ to anchor the expression to the beginning and end of the string, respectively:
Regex regex = new Regex("^[A-Za-z]+$");
I've also changed * to + because * will match 0 or more times while + will match 1 or more times.
You should add anchors for start and end of string: ^[A-Za-z]+$
Regarding the question of regex examples have a look at http://regexlib.com/.
For the regex, have a look at the special characters ^ and $, which represent starting and ending of string. This site can come in handy when constructing regexes in the future.
The asterisk character in regex specifies "zero or more of the preceding character class".
This explains why your expression is failing, because it will succeed if the string contains zero or more letters.
What you probably intended was to have one or more letters, in which case you should use the plus sign instead of the asterisk.
Having made that change, now it will fail if you enter a string that doesn't contain any letters, as you intended.
However, this still won't work for you entirely, because it will allow other characters in the string. If you want to restrict it to only letters, and nothing else, then you need to provide the start and end anchors (^ and $) in your regex to make the expression check that the 'one or more letters' is attached to the start and end of the string.
^[a-zA-Z]+$
This should work as intended.
Hope that helps.
For more information on regex, I recommend http://www.regular-expressions.info/reference.html as a good reference site.
I don't know what the C#'s regex syntax is, but try [A-Za-z]+.
Try ^[A-Za-z]+$ If you don't include the ^$ it will match on any part of the string that has a alpha characters in it.
I know the question is only about strictly alphabetic input, but here's an interesting way of solving this which does not break on accented letters and other such special characters.
The regex "^\b.+?\b" will match the first word on the start of a string, but only if the string actually starts with a valid word character. Using that, you can simply check if A) the string matches, and B) the length of the matched string equals your full string's length:
public Boolean IsSingleWord(String userInput)
{
Regex firstWordRegex = new Regex("^\\b.+?\\b");
Match firstWordMatch = firstWordRegex.Match(userInput);
return firstWordMatch.Success && firstWordMatch.Length == userInput.Length;
}
The other persons have wrote how to resolve the problem you know. Now I'll speak about the problem you perhaps don't know: diacritics :-) Your solution doesn't support àèéìòù and many other letters. A correct solution would be:
^(\p{L}\p{M}*)+$
where \p{L} is any letter plus \p{M}* that is 0 or more diacritic marks (in unicode diacritics can be "separated" from base letters, so you can have something like a + ` = à or you can have precomposed characters like the standard à)
if you just need the characters a-zA-Z you could simply iterate over the characters and compare the single characters if they are inside your range
for example:
for each character c: ('a' <= c && c <= 'z') || ('A' <= c && c <= 'Z')
This could increase your performance

Categories