C# Regex method explains - c#

new Regex(#"\n|\r|\\|<|>|\*|!|\$|%|;");
I have an regex example above, but I can not really understand what is trying to find? can anyone give me a hand please?

The regex matches one of the characters separated by the alternation operator |. There are a few special characters (like \n or \r for newline and carriage return, or \$ for a literal dollar sign and \* for a literal asterisk because $ and * are regex metacharacters), but other than that, it's quite straightforward.
That said, for matching a single character out of a list of valid characters, a character class is usually the better choice, not only because there is less need to escape the metacharacters:
new Regex(#"[\n\r\\<>*!$%;]");

It'll try to match any of the special character listed: \n, \r, \, <, >, *, !, $, % The | is the regex OR operator.
Some characters need to be escaped with an extra \ as they have a signification in the regex lanugage (\, $, ...)

| in regex is an alternation operator. A|B means match either A or B. It can also be written using a character class - [AB] which also means the same thing.
The benefit of using character class is, you don't need to escape regex meta-characters inside it, which you have to do outside, as you did for *. So, your regex can be shortened to:
new Regex(#"[\n\r\\<>*!$%;]");

Related

Ignore spaces at the end of a string

I use the following regex, which is working, but I want to add a condition so as to accept spaces at the end of the value. Currently it is not working.
What am I missinghere?
^[a-zA-Z][a-zA-Z0-9_]+\s?$[\s]*$
Assumption: you added the two end of string anchors $ by mistake.
? quantifier, matching one or zero repetitions, makes the previous item optional
* quantifier, matching zero or more repetitions
So change your expression to
^[a-zA-Z][a-zA-Z0-9_]+\s*$
this is matching any amount of whitespace at the end of the string.
Be aware, whitespace is not just the space character, it is also tabs and newlines (and more)!
If you really want to match only space, just write a space or make a character class with all the characters you want to match.
^[a-zA-Z][a-zA-Z0-9_]+ *$
or
^[a-zA-Z][a-zA-Z0-9_]+[ \t]*$
Next thing is: Are you sure you only want plain ASCII letters? Today there is Unicode and you can use Unicode properties, scripts and blocks in your regular expressions.
Your expression in Unicode, allowing all letters and digits.
^\p{L}\w+\s*$
\p{L} Unicode property, any kind of letter from any language.
\w shorthand character class for word characters (letters, digits and connector characters like "_") [\p{L}\p{Nd}\p{Pc}] as character class with Unicode properties. Definition on msdn
why two dollars?
^[a-zA-Z][a-zA-Z0-9_]+\s*$
or make it this :
"^[a-zA-Z][a-zA-Z0-9_]+\s?\$\s*$"
if you want to literally match the dollar.
Try this -
"^[a-zA-Z][a-zA-Z0-9_]+(\s)?$"
or this -
"^[a-zA-Z][a-zA-Z0-9_]+((\s){,})$"
$ indicates end of expression, if you are looking $ as character, then escape it with \

Match a string until it meets a '('

I've managed to get everything (well, all letters) up to a whitespace using the following:
#"^.*([A-Z][a-z].*)]\s"
However, I want to to match to a ( instead of a whitespace... how can I manage this?
Without having the '(' in the match
If what you want is to match any character up until the ( character, then this should work:
#"^.*?(?=\()"
If you want all letters, then this should do the trick:
#"^[a-zA-Z]*(?=\()"
Explanation:
^ Matches the beginning of the string
.*? One or more of any character. The trailing ? means 'non-greedy',
which means the minimum characters that match, rather than the maximum
(?= This means 'zero-width positive lookahead assertion'. That means that the
containing expression won't be included in the match.
\( Escapes the ( character (since it has special meaning in regular
expressions)
) Closes off the lookahead
[a-zA-Z]*? Zero or more of any character from a to z, or from A to Z
Reference: Regular Expression Language - Quick Reference (MSDN)
EDIT: Actually, instead of using .*?, as Casimir has noted in his answer it's probably easier to use [^\)]*. The ^ used inside a character class (a character class is the [...] construct) inverts the meaning, so instead of "any of these characters", it means "any except these characters". So the expression using that construct would be:
#"^[^\(]*(?=\()"
Using a constraining character class is the best way
#"^[^(]*"
[^(] means all characters but (
Note that you don't need a capture group since that you want is the whole pattern.
You can use this pattern:
([A-Z][a-z][^(]*)\(
The group will match a capital Latin letter, followed by a lower-case Latin letter, followed by any number of characters other than an open parenthesis. Note that ^.* is not necessary.
Or this, which produces the same basic behavior but uses a non-greedy quantifier instead:
([A-Z][a-z].*?)\(

C# Regex match on special characters

I know this stuff has been talked about a lot, but I'm having a problem trying to match the following...
Example input: "test test 310-315"
I need a regex expression that recognizes a number followed by a dash, and returns 310. How do I include the dash in the regex expression though. So the final match result would be: "310".
Thanks a lot - kcross
EDIT: Also, how would I do the same thing but with the dash preceding, but also take into account that the number following the dash could be a negative number... didnt think of this one when I wrote the question immediately. for example: "test test 310--315" returns -315 and "test 310-315" returns 315.
Regex regex = new Regex(#"\d+(?=\-)");
\d+ - Looks for one or more digits
(?=\-) - Makes sure it is followed by a dash
The # just eliminates the need to escape the backslashes to keep the compiler happy.
Also, you may want this instead:
\d+(?=\-\d+)
This will check for a one or more numbers, followed by a dash, followed by one or more numbers, but only match the first set.
In response to your comment, here's a regex that will check for a number following a -, while accounting for potential negative (-) numbers:
Regex regex = new Regex(#"(?<=\-)\-?\d+");
(?<=\-) - Negative lookbehind which will check and make sure there is a preceding -
\-? - Checks for either zero or one dashes
\d+ - One or more digits
(?'number'\d+)- will work ( no need to escape ). In this example the group containing the single number is the named group 'number'.
if you want to match both groups with optional sign try:
#"(?'first'-?\d+)-(?'second'-?\d+)"
See it working here.
Just to describe, nothing complicated, just using -? to match an optional - and \d+ to match one or more digit. a literal - match itself.
here's some documentation that I use:
http://www.mikesdotnetting.com/Article/46/CSharp-Regular-Expressions-Cheat-Sheet
in the comments section of that page, it suggests escaping the dash with '\-'
make sure you escape your escape character \
You would escape the special meaning of - in regex language (means range) using a backslash (\). Since backslash has a special meaning in C# literals to escape quotes or be part of some characters, you need to escape that with another backslash(\). So essentially it would be \d+\\-.
\b\d*(?=\-) you will want to look ahead for the dash
\b = is start at a word boundry
\d = match any decimal digit
* = match the previous as many times as needed
(?=\-) = look ahead for the dash
Edited for Formatting issue with the slash not showing after posting

Regexp Remove any non alphanumeric, but leave some special characters in one expression

I have this code that replaces all non alphanumeric characters with "-" char.
return Regex.Replace(strIn, #"[\W|_]+", "-", RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase);
but I need to change it to allow pass some special characters (one or more) for example: #,*,%
how to change this regular expression?
Use
[^\p{L}\p{N}#*%]+
This matches one or more characters that are neither letters nor digits nor any of #, * or %.
Another option, you can use charcter class subtractioninfo, for example to remove # from the character class:
[\W_-[#]]+
Just add other accepted special chars after the #. Live example here: http://rextester.com/rundotnet?code=YFQ40277
How about this one:
[^a-zA-Z0-9#*%]+
If you are using unicode you can do (as Tim's answer):
[^\p{L}\p{N}#*%]+
Use this.
([^\w#*%]|_)
Add any other special characters after the %.
It is basically saying, match any character that is not (^) a word character(\w), #, * or % OR match _.
It seems this way is the best solution for you
#"(?!.*[^\w#*%])"
You can use set subtraction for that:
#"[\W_-[#*%]]+"
This matches the set of all non-word characters and the underscore, minus the set of #, * and %.
Note that you don't have to use | for "or" in a character class, since that's implied. In fact, the | in your regex just matches |.
Note also that in .NET, \w matches a few other "connector punctuation" characters besides the underscore. If you want to match the other characters too, you can use
#"[\W\p{Pc}-[#*%]]+"

Simple Regex help needed

I need to create a regular expression to match a string that contains anything other than the specified characters. The characters are
a-z A-Z 0-9
+ - * / : . $ %
and a space
I'm not very familiar with regex so I'm unsure how to put it together and test it. I can find lots of cheat sheets but I don't know how to actually structure it as one whole pattern.
a ^ in a capture group character class negates those characters in the class. So:
[^a-zA-Z0-9+\-*/:.]
Some characters there are special chars in regex so they're escaped with \.
~^[^a-z0-9\+\-\*\/\:\.\$\%\x20]*$~i
Starting with ^ and ending with $ to make sure that string contains only allowed characters.
Character group is starting with ^ for negation. \x20 stands for space, to much any whitespace use \x20. This RegExp is case insensitive (i modifier). You may test your regular expressions here http://regex.larsolavtorvik.com/

Categories