Regex to match all alphanumeric and certain special characters? - c#

I am trying to get a regex to work that will allow all alphanumeric characters (both caps and non caps as well as numbers) but also allow spaces, forward slash (/), dash (-) and plus (+)?
I have been playing with a refiddle: http://refiddle.com/gqr but so far no success, anyone any ideas?
I'm not sure if it makes any difference but I am trying to do this in c#?

If you want to allow only those, you will also need the use of the anchors ^ and $.
^[a-zA-Z0-9_\s\+\-\/]+$
^ ^^
This is your regex and I added characters as indicated from the second line. Don't forget the + or * near the end to allow for more than 1 character (0 or more in the case of *), otherwise the regex will try to match only one character, even with .Matches.
You can also replace the whole class [A-Za-z0-9_] by one \w, like so:
^[\w\s\+\-\/]+$
EDIT:
You can actually avoid some escaping and avoid one last escaping with a careful placement (i.e. ensure the - is either at the beginning or at the end):
^[\w\s+/-]+$

Your regex would look something like:
/[\w\d\/\-\+ ]+/g
That's all letters, digits, and / - + and spaces (but not any other whitespace characters)
The + at the end means that at least 1 character is required. Change it to a * if you want to allow an empty string.

This code does that:
var input = "Test if / this+-works&sec0nd 2 part*3rd part";
var matches = Regex.Matches(input, #"([0-9a-zA-Z /+-]+)");
foreach (Match m in matches) if (m.Success) Console.WriteLine(m.Value);
And output will have 3 result lines:
Test if / this+-works
sec0nd 2 part
3rd---part (I showed spaces with - here)

Related

C# Regex match on special characters

I know this stuff has been talked about a lot, but I'm having a problem trying to match the following...
Example input: "test test 310-315"
I need a regex expression that recognizes a number followed by a dash, and returns 310. How do I include the dash in the regex expression though. So the final match result would be: "310".
Thanks a lot - kcross
EDIT: Also, how would I do the same thing but with the dash preceding, but also take into account that the number following the dash could be a negative number... didnt think of this one when I wrote the question immediately. for example: "test test 310--315" returns -315 and "test 310-315" returns 315.
Regex regex = new Regex(#"\d+(?=\-)");
\d+ - Looks for one or more digits
(?=\-) - Makes sure it is followed by a dash
The # just eliminates the need to escape the backslashes to keep the compiler happy.
Also, you may want this instead:
\d+(?=\-\d+)
This will check for a one or more numbers, followed by a dash, followed by one or more numbers, but only match the first set.
In response to your comment, here's a regex that will check for a number following a -, while accounting for potential negative (-) numbers:
Regex regex = new Regex(#"(?<=\-)\-?\d+");
(?<=\-) - Negative lookbehind which will check and make sure there is a preceding -
\-? - Checks for either zero or one dashes
\d+ - One or more digits
(?'number'\d+)- will work ( no need to escape ). In this example the group containing the single number is the named group 'number'.
if you want to match both groups with optional sign try:
#"(?'first'-?\d+)-(?'second'-?\d+)"
See it working here.
Just to describe, nothing complicated, just using -? to match an optional - and \d+ to match one or more digit. a literal - match itself.
here's some documentation that I use:
http://www.mikesdotnetting.com/Article/46/CSharp-Regular-Expressions-Cheat-Sheet
in the comments section of that page, it suggests escaping the dash with '\-'
make sure you escape your escape character \
You would escape the special meaning of - in regex language (means range) using a backslash (\). Since backslash has a special meaning in C# literals to escape quotes or be part of some characters, you need to escape that with another backslash(\). So essentially it would be \d+\\-.
\b\d*(?=\-) you will want to look ahead for the dash
\b = is start at a word boundry
\d = match any decimal digit
* = match the previous as many times as needed
(?=\-) = look ahead for the dash
Edited for Formatting issue with the slash not showing after posting

Regex for my password validation

I need a regular expression for my password format. It must ensure that password only contains letters a-z, digits 0-9 and special characters: .##$%&.
I am using .NET C# programming language.
This is my code:
Regex userAndPassPattern = new Regex("^[a-z0-9.##$%&]$");
if (!userAndPassPattern.IsMatch(username) || !userAndPassPattern.IsMatch(password))
return false;
The problem is that I always get back false.
!A || !B is logically equivalent to !(A && B)
So you could write better
!(userAndPassPattern.IsMatch(username) && userAndPassPattern.IsMatch(password))
Then you have a special character $ in you character class, maybe you need to mask it \$
I'm not quite sure about this, because in a character class it is not a special character. Maybe it depends on the RegEx engine in use. If you mask the $ it should do no harm ([a-z0-9.##\$%&])
Then you have just a single character to match. You need a quantifier
[a-z0-9.##$%&] means one single character out of the given, will match aor b or 0 but not ab
[a-z0-9.##$%&]+ many characters out of the given, from 1 to endless appearances, will match a, b, and ab and ba etc.
edit
This is what you want
Regex userAndPassPattern = new Regex("^[a-z0-9\.##\$%&]+$");
if (!(userAndPassPattern.IsMatch(username) && userAndPassPattern.IsMatch(password))) {
return false;
}
You forgot to add the '+' for matching one or more times:
Regex userAndPassPattern = new Regex("^[a-z0-9.##$%&]+$");
This code Regex userAndPassPattern = new Regex("^[a-z0-9.##$%&]$"); will only match a username or password that is a single character long.
You are looking for something like this Regex userAndPassPattern = new Regex("^[a-z0-9.##$%&]+$"); which will match one or more of the characters in your class. The + symbol tells it to match one or more of the previous atom (which in this case is the character class you specified in the square brackets)
Also, if you did not mean to constrain the match to lowercase characters, you should add 'A-Z' to the character class Regex userAndPassPattern = new Regex("^[A-Za-z0-9.##$%&]$");
You might also want to implement a minimum length restriction which can be accomplished by replacing the + with the {n,} construct, where n is the minimum length you want to match. For example:
this would match a minimum of 6 characters
Regex userAndPassPattern = new Regex("^[a-z0-9.##$%&]{6,}$");
this would match a minimum of 6 and a maximum of 12
Regex userAndPassPattern = new Regex("^[a-z0-9.##$%&]{6,12}$");
You have two problems. First . and $ need to be escaped. Second you are matching only 1 character. Add a + before the last $:
^[a-z0-9\.##\$%&]+$
Edit: Another suggestion, if you have a minimum/maximum length you can replace the + with, for example, {6,16} or whatever you think is appropriate. This will match strings that are 6 to 16 character inclusive and reject any shorter or longer strings. If you don't care about an upper limit, you could use {6,}.
Have you tried using a verbatim string literal when you're using regex escape sequences?
Regex userAndPassPattern = new Regex(#"^[a-z0-9##\$%&]+$");
if (!userAndPassPattern.IsMatch(username) || !userAndPassPattern.IsMatch(password))
return false;
Your pattern only allows a single character set you probably want a repetition operator like * + or {10,}.
Your character set includes . which matches any character, defeating the object of the character class. If you wanted to match "." then you need to escape it with \.

Regex match if a string has length 2 and contains 1 letter and 1 number

Guys I hate Regex and I suck at writing.
I have a string that is space separated and contains several codes that I need to pull out. Each code is marked by beginning with a capital letter and ending with a number. The code is only two digits.
I'm trying to create an array of strings from the initial string and I can't get the regular expression right.
Here is what I have
String[] test = Regex.Split(originalText, "([a-zA-Z0-9]{2})");
I also tried:
String[] test = Regex.Split(originalText, "([A-Z]{1}[0-9]{1})");
I don't have any experience with Regex as I try to avoid writing them whenever possible.
Anyone have any suggestions?
Example input:
AA2410 F7 A4 Y7 B7 A 0715 0836 E0.M80
I need to pull out F7, A4, B7. E0 should be ignored.
You want to collect the results, not split on them, right?
Regex regexObj = new Regex(#"\b[A-Z][0-9]\b");
allMatchResults = regexObj.Matches(subjectString);
should do this. The \bs are word boundaries, making sure that only entire strings (like A1) are extracted, not substrings (like the A1 in TWA101).
If you also need to exclude "words" with non-word characters in them (like E0.M80 in your comment), you need to define your own word boundary, for example:
Regex regexObj = new Regex(#"(?<=^|\s)[A-Z][0-9](?=\s|$)");
Now A1 only matches when surrounded by whitespace (or start/end-of-string positions).
Explanation:
(?<= # Assert that we can match the following before the current position:
^ # Start of string
| # or
\s # whitespace.
)
[A-Z] # Match an uppercase ASCII letter
[0-9] # Match an ASCII digit
(?= # Assert that we can match the following after the current position:
\s # Whitespace
| # or
$ # end of string.
)
If you also need to find non-ASCII letters/digits, you can use
\p{Lu}\p{N}
instead of [A-Z][0-9]. This finds all uppercase Unicode letters and Unicode digits (like Ä٣), but I guess that's not really what you're after, is it?
Do you mean that each code looks like "A00"?
Then this is the regex:
"[A-Z][0-9][0-9]"
Very simple... By the way, there's no point writing {1} in a regex. [0-9]{1} means "match exactly one digit, which is exactly like writing [0-9].
Don't give up, simple regexes make perfect sense.
This should be ok:
String[] all_codes = Regex.Split(originalText, #"\b[A-Z]\d\b");
It gives you an array with all code starting with a capital letter followed by a digit, separated by an kind of word boundary (site space etc.)

Regular Expression: single word

I want to check in a C# program, if a user input is a single word. The word my only have characters A-Z and a-z. No spaces or other characters.
I try [A-Za-z]* , but this doesn't work. What is wrong with this expression?
Regex regex = new Regex("[A-Za-z]*");
if (!regex.IsMatch(userinput);)
{
...
}
Can you recomend website with a comprensiv list of regex examples?!
It probably works, but you aren't anchoring the regular expression. You need to use ^ and $ to anchor the expression to the beginning and end of the string, respectively:
Regex regex = new Regex("^[A-Za-z]+$");
I've also changed * to + because * will match 0 or more times while + will match 1 or more times.
You should add anchors for start and end of string: ^[A-Za-z]+$
Regarding the question of regex examples have a look at http://regexlib.com/.
For the regex, have a look at the special characters ^ and $, which represent starting and ending of string. This site can come in handy when constructing regexes in the future.
The asterisk character in regex specifies "zero or more of the preceding character class".
This explains why your expression is failing, because it will succeed if the string contains zero or more letters.
What you probably intended was to have one or more letters, in which case you should use the plus sign instead of the asterisk.
Having made that change, now it will fail if you enter a string that doesn't contain any letters, as you intended.
However, this still won't work for you entirely, because it will allow other characters in the string. If you want to restrict it to only letters, and nothing else, then you need to provide the start and end anchors (^ and $) in your regex to make the expression check that the 'one or more letters' is attached to the start and end of the string.
^[a-zA-Z]+$
This should work as intended.
Hope that helps.
For more information on regex, I recommend http://www.regular-expressions.info/reference.html as a good reference site.
I don't know what the C#'s regex syntax is, but try [A-Za-z]+.
Try ^[A-Za-z]+$ If you don't include the ^$ it will match on any part of the string that has a alpha characters in it.
I know the question is only about strictly alphabetic input, but here's an interesting way of solving this which does not break on accented letters and other such special characters.
The regex "^\b.+?\b" will match the first word on the start of a string, but only if the string actually starts with a valid word character. Using that, you can simply check if A) the string matches, and B) the length of the matched string equals your full string's length:
public Boolean IsSingleWord(String userInput)
{
Regex firstWordRegex = new Regex("^\\b.+?\\b");
Match firstWordMatch = firstWordRegex.Match(userInput);
return firstWordMatch.Success && firstWordMatch.Length == userInput.Length;
}
The other persons have wrote how to resolve the problem you know. Now I'll speak about the problem you perhaps don't know: diacritics :-) Your solution doesn't support àèéìòù and many other letters. A correct solution would be:
^(\p{L}\p{M}*)+$
where \p{L} is any letter plus \p{M}* that is 0 or more diacritic marks (in unicode diacritics can be "separated" from base letters, so you can have something like a + ` = à or you can have precomposed characters like the standard à)
if you just need the characters a-zA-Z you could simply iterate over the characters and compare the single characters if they are inside your range
for example:
for each character c: ('a' <= c && c <= 'z') || ('A' <= c && c <= 'Z')
This could increase your performance

Problem with regex, how do I get all with \S up until a special character?

Ive got the text:
192.168.20.31 Url=/flash/56553550_hi.mp4?token=(uniquePlayerReference=81781956||videoId=1)
And im trying to get the uniquePlayerReference and the videoId
Ive tried this regular expression:
(?<=uniquePlayerReference=)\S*
but it matches:
81781956||videoId=1)
And then I try and get the video id with this:
(?<=videoId=)\S*
But it matches the ) after the videoId.
My question is two fold:
1) How do I use the \S character and get it to stop at a character? (essentially what is the regex to do what i want) I cant get it to stop at a defined character, I think I need to use a positive lookahead to match but not include the double pipe).
2) When should I use brackets?
The problem is the mul;tiplicity operator you have here - the * - which means "as many as possible". If you have an explicit number in mind you can use the operator {a,b} where a is a minimum and b a maximum number fo matches, but if you have an unknown number, you can't use \S (which is too generic).
As for brackets, if you mean () you use them to capture a part of a match for backreferencing. Bit complicated, think you need to use a reference for that.
I think you want something like this:
/uniquePlayerReference=(\d+)||videoId=(\d+)/i
and then backreference to \1 and \2 respectively.
Given that both id's are numeric you are probably better off using \d instead of \S. \d only matches numeric digits whereas \S matches any non-whitespace character.
What you might also do is a non gready match up till the character you do not want to match like so:
uniquePlayerReference=(.*?)\|\|videoId=(.*?)\)
Note that I have escaped both the | and ) characters because otherwise they would have a special meaning inside a regex.
In C# you would use this like so: (which also answers your question what the brackets are for, they are meant to capture parts of the matched result).
Regex regex = new Regex(#"uniquePlayerReference=(.*?)\|\|videoId=(.*?)\)");
Match match = regex.Match(
"192.168.20.31 Url=/flash/56553550_hi.mp4?token=(uniquePlayerReference=81781956||videoId=1)");
if (match.Success)
{
string playerReference = match.Groups[1].Value;
string videoId = match.Groups[2].Value;
// Etc.
}
If the ID isn't just digits then you could use [^|] instead of \S, i.e.
(?<=uniquePlayerReference=)[^|]*
Then you can use
(?<=videoId=)[^)]*
For the video ID
The \S means it matches any non-whitespace character, including the closing parenthesis. So if you had to use \S, you would have to explicitly say stop at the closing parenthesis, like this:
videoId=(\S+)\)
Therefore, you are better off using the \d, since what you are looking for are numeric:
uniquePlayerReference=(\d+)
videoId=(\d+)

Categories