regular expression ".*[^a-zA-Z0-9_].*" - c#

As I am trying to read more about regular expressions in C#, I just want to make sure of my conclusion that I made.
for the following expression ".*[^a-zA-Z0-9_].* ", the " .* " at the beginning and end are useless, is that right ? because as I understood, that ".*" means zero or more occurrence of any character, but being followed by "[^a-zA-Z0-9_]" which means any character other than any combination of letters and digits case insensitive, makes ".*" useless to be added before and after "[^a-zA-Z0-9_]", is that right ?
Here is the code I am using to check if the expressions matches
// Here we call Regex.Match.
Match match = Regex.Match("anytest#", ".*[^a-z A-Z0-9_].*");
//Match match = Regex.Match("anytest#", "[^a-z A-Z0-9_]");
// Here we check the Match instance.
if (match.Success)
Console.WriteLine("error");
else
Console.WriteLine("no error");

.*[^a-zA-Z0-9_].* will match the entire input as long as there is a non-alphanumeric/underscore somewhere in the input. [^a-zA-Z0-9_] will match only a single non-alphanumeric/underscore character (most likely the last one, if you're using the default greedy matching) if it is somewhere in the input. Which one you want depends on the input and what you want to do once you find out if a non-alphanumeric/underscore character exists in the input.

The only difference would be whether the "margin characters" will be included in the result or not.
For:
ab41--_71j
It will match:
1--_7
And without the .* at beginning and end it will match:
--_
Any string will match the .*[^a-zA-Z0-9_].* regex at least once as long as it has at least one character that isn't a-zA-Z0-9_
From your currently last comment in your answer, I understand that you actually use:
^[a-zA-Z0-9]*$
This will match only if all characters are digit/letters.
If it doesn't match, then the string is invalid.
If you also want to allow the _ character, then use:
^[a-zA-Z0-9_]*$
Which can even be shortened to:
^\w$
In general, it is better to make regex's Validate rather than Invalidate strings. It just makes more sense and is more intuitive.
So my validation would look like:
if (Regex.IsMatch("anytest#", "^\\w$"))
{
Console.WriteLine("Success");
}
else
{
Console.WriteLine("Error");
}
Another option that is probably faster:
if ("anytest#".ToCharArray().All(c => char.IsLetterOrDigit(c) || c == '_'))
{
Console.WriteLine("Success");
}
else
{
Console.WriteLine("Error");
}
And if you don't want '_' to be included, it can even look nicer;
if ("anytest#".ToCharArray().All(char.IsLetterOrDigit))
{
Console.WriteLine("Success");
}
else
{
Console.WriteLine("Error");
}

No, because there are other characters than a-Z and 0-9.
That regex matches all strings that start with any characters followed not by a-zA-Z0-9 and end with any characters. Or just a string that does not contain a-zA-Z0-9 at all.
If you leave the .* then you just have a regex that matches a charatcer that does not contain a-zA-Z0-9 at all.
.*[^a-zA-Z0-9_].* matches for instance: ABC_ß_ABC
[^a-zA-Z0-9_] matches for instance: ß (and this regex just matches 1 character)

Input 1 : ABC_ß_ABC
Input 2 : ß
Regex 1: .*[^a-zA-Z0-9_].*
Regex 2: [^a-zA-Z0-9_]
Both the inputs match both the regex,
For input 1
Regex 1 matches 9 characters
Regex 2 matches only 1 character

Only include those tokens in the Regex that you are actually looking for. In your case you didn't actually care whether there are any other characters before or after the excluding character class you specified. Adding .* before and after that doesn't change the success of the match, but makes matching more complicated. A Regex matches anywhere already, unless you specifically anchor it somehow, e.g. using ^ at the start.

Related

Regular Expression Match for a Title

I need to use C# to write a Regular Expression for a title, here is the requirement:
Title is required (length > 0);
Maximum 256 characters (length <= 256);
No character is forbidden, but whitespace only is illegal (the title ONLY containing whitespaces is illegal);
No leading or trailing whitespaces;
I have already have this:
^.{1,256}$
So how can I meet the rule 3?
EDIT:
Explained rule 3 more clearly;
I added rule 4 from Mario's answer.
I'd skip regular expressions completely, because you can just hardcode string cleanup and validation in two simple steps:
Use String.Trim(null) to remove all leading/trailing whitespaces.
Compare the length of the remaining string.
Uppercase the first character (if you want to).
This works, because a name consisting of whitespaces only would be trimmed to 0 length.
Also this avoids using titles such as " Let's go!".
You need to use a zero-width assertion:
#"^(?=.*\S).{1,256}$"
(?=.*\S) matches any sequence of characters that ends in a non-whitespace character, but does not affect the rest of the match.
Use the (?=pattern)
#"^(?=.*\S).{1,256}$"
The (?=pattern) asserts that the specified pattern exists immediately after this location.
So, the regex matches if and only if after the beginning of the string, it matches the pattern .*\S and if the whole string matches the pattern ^.{1,256}$
Though my own answer fits my question, but the credit should still go to the other guys (I either upvoted and chose as the correct answer), because I edited my question after their answer.
=====================
I finally came up a pure regex solution (without any extra steps)
^(\S|\S.{0,254}\S)$
(though I don't understand why the parentheses () are important)
The following test cases pass:
[TestMethod]
public void CheckTitleTest()
{
// Empty
Assert.IsFalse(CheckTitle(#""));
// A whitespace
Assert.IsFalse(CheckTitle(#" "));
// Multiple whitespace only
// http://msdn.microsoft.com/en-us/library/t809ektx.aspx
Assert.IsFalse(CheckTitle(" \t \n \u1680"));
// Leading whitespaces
Assert.IsFalse(CheckTitle(" \tabc"));
// Trailing whitespaces
Assert.IsFalse(CheckTitle("abc\t "));
// Leading and trailing whitespaces
Assert.IsFalse(CheckTitle(" \tabc\t "));
// Too long: 257 character
Assert.IsFalse(CheckTitle(#"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/*"));
// A normal title
Assert.IsTrue(CheckTitle(#"This is a normal title"));
Assert.IsTrue(CheckTitle(#"This is a normal title."));
// 256 characters
Assert.IsTrue(CheckTitle(#"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"));
// A very simple title
Assert.IsTrue(CheckTitle(#"A"));
Assert.IsTrue(CheckTitle(#"!"));
Assert.IsTrue(CheckTitle(#"\"));
}

.NET Regex: negate previous character for the first character in string

Consider following string
"Some" string with "quotes" and \"pre-slashed\" quotes
Using regex, I want to find all the double quotes with no slash before them. So I want the regex to find four matches for the example sentence
This....
[^\\]"
...would find only three of them. I suppose that's because of the regex's state machine which is first validating the command to negate the presence of the slash.
That means I need to write a regex with some kind of look-behind, but I don't know how to work with these lookaheads and lookbehinds...im not even sure that's what I'm looking for.
The following attempt returns 6, not 4 matches...
"(?<!\\)
"(?<!\\")
Is what you're looking for
If you want to match "Some" and "quotes", then
(?<!\\")(?!\\")"[a-zA-Z0-9]*"
will do
Explanation:
(?<!\\") - Negative lookbehind. Specifies a group that can not match before your main expression
(?!\\") - Negative lookahead. Specifies a group that can not match after your main expression
"[a-zA-Z0-9]*" - String to match between regular quotes
Which means - match anything that doesn't come with \" before and \" after, but is contained inside double quotes
You almost got it, move the quote after the lookbehind, like:
(?<!\\)"
Also be ware of cases like
"escaped" backslash \\"string\"
You can use an expression like this to handle those:
(?<!\\)(?:\\\\)*"
Try this
(?<!\\)(?<qs>"[^"]+")
Explanation
<!--
(?<!\\)(?<qs>"[^"]+")
Options: case insensitive
Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!\\)»
Match the character “\” literally «\\»
Match the regular expression below and capture its match into backreference with name “qs” «(?<qs>"[^"]+")»
Match the character “"” literally «"»
Match any character that is NOT a “"” «[^"]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “"” literally «"»
-->
code
try {
if (Regex.IsMatch(subjectString, #"(?<!\\)(?<qs>""[^""]+"")", RegexOptions.IgnoreCase)) {
// Successful match
} else {
// Match attempt failed
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}

Regex for my password validation

I need a regular expression for my password format. It must ensure that password only contains letters a-z, digits 0-9 and special characters: .##$%&.
I am using .NET C# programming language.
This is my code:
Regex userAndPassPattern = new Regex("^[a-z0-9.##$%&]$");
if (!userAndPassPattern.IsMatch(username) || !userAndPassPattern.IsMatch(password))
return false;
The problem is that I always get back false.
!A || !B is logically equivalent to !(A && B)
So you could write better
!(userAndPassPattern.IsMatch(username) && userAndPassPattern.IsMatch(password))
Then you have a special character $ in you character class, maybe you need to mask it \$
I'm not quite sure about this, because in a character class it is not a special character. Maybe it depends on the RegEx engine in use. If you mask the $ it should do no harm ([a-z0-9.##\$%&])
Then you have just a single character to match. You need a quantifier
[a-z0-9.##$%&] means one single character out of the given, will match aor b or 0 but not ab
[a-z0-9.##$%&]+ many characters out of the given, from 1 to endless appearances, will match a, b, and ab and ba etc.
edit
This is what you want
Regex userAndPassPattern = new Regex("^[a-z0-9\.##\$%&]+$");
if (!(userAndPassPattern.IsMatch(username) && userAndPassPattern.IsMatch(password))) {
return false;
}
You forgot to add the '+' for matching one or more times:
Regex userAndPassPattern = new Regex("^[a-z0-9.##$%&]+$");
This code Regex userAndPassPattern = new Regex("^[a-z0-9.##$%&]$"); will only match a username or password that is a single character long.
You are looking for something like this Regex userAndPassPattern = new Regex("^[a-z0-9.##$%&]+$"); which will match one or more of the characters in your class. The + symbol tells it to match one or more of the previous atom (which in this case is the character class you specified in the square brackets)
Also, if you did not mean to constrain the match to lowercase characters, you should add 'A-Z' to the character class Regex userAndPassPattern = new Regex("^[A-Za-z0-9.##$%&]$");
You might also want to implement a minimum length restriction which can be accomplished by replacing the + with the {n,} construct, where n is the minimum length you want to match. For example:
this would match a minimum of 6 characters
Regex userAndPassPattern = new Regex("^[a-z0-9.##$%&]{6,}$");
this would match a minimum of 6 and a maximum of 12
Regex userAndPassPattern = new Regex("^[a-z0-9.##$%&]{6,12}$");
You have two problems. First . and $ need to be escaped. Second you are matching only 1 character. Add a + before the last $:
^[a-z0-9\.##\$%&]+$
Edit: Another suggestion, if you have a minimum/maximum length you can replace the + with, for example, {6,16} or whatever you think is appropriate. This will match strings that are 6 to 16 character inclusive and reject any shorter or longer strings. If you don't care about an upper limit, you could use {6,}.
Have you tried using a verbatim string literal when you're using regex escape sequences?
Regex userAndPassPattern = new Regex(#"^[a-z0-9##\$%&]+$");
if (!userAndPassPattern.IsMatch(username) || !userAndPassPattern.IsMatch(password))
return false;
Your pattern only allows a single character set you probably want a repetition operator like * + or {10,}.
Your character set includes . which matches any character, defeating the object of the character class. If you wanted to match "." then you need to escape it with \.

Regular Expression: single word

I want to check in a C# program, if a user input is a single word. The word my only have characters A-Z and a-z. No spaces or other characters.
I try [A-Za-z]* , but this doesn't work. What is wrong with this expression?
Regex regex = new Regex("[A-Za-z]*");
if (!regex.IsMatch(userinput);)
{
...
}
Can you recomend website with a comprensiv list of regex examples?!
It probably works, but you aren't anchoring the regular expression. You need to use ^ and $ to anchor the expression to the beginning and end of the string, respectively:
Regex regex = new Regex("^[A-Za-z]+$");
I've also changed * to + because * will match 0 or more times while + will match 1 or more times.
You should add anchors for start and end of string: ^[A-Za-z]+$
Regarding the question of regex examples have a look at http://regexlib.com/.
For the regex, have a look at the special characters ^ and $, which represent starting and ending of string. This site can come in handy when constructing regexes in the future.
The asterisk character in regex specifies "zero or more of the preceding character class".
This explains why your expression is failing, because it will succeed if the string contains zero or more letters.
What you probably intended was to have one or more letters, in which case you should use the plus sign instead of the asterisk.
Having made that change, now it will fail if you enter a string that doesn't contain any letters, as you intended.
However, this still won't work for you entirely, because it will allow other characters in the string. If you want to restrict it to only letters, and nothing else, then you need to provide the start and end anchors (^ and $) in your regex to make the expression check that the 'one or more letters' is attached to the start and end of the string.
^[a-zA-Z]+$
This should work as intended.
Hope that helps.
For more information on regex, I recommend http://www.regular-expressions.info/reference.html as a good reference site.
I don't know what the C#'s regex syntax is, but try [A-Za-z]+.
Try ^[A-Za-z]+$ If you don't include the ^$ it will match on any part of the string that has a alpha characters in it.
I know the question is only about strictly alphabetic input, but here's an interesting way of solving this which does not break on accented letters and other such special characters.
The regex "^\b.+?\b" will match the first word on the start of a string, but only if the string actually starts with a valid word character. Using that, you can simply check if A) the string matches, and B) the length of the matched string equals your full string's length:
public Boolean IsSingleWord(String userInput)
{
Regex firstWordRegex = new Regex("^\\b.+?\\b");
Match firstWordMatch = firstWordRegex.Match(userInput);
return firstWordMatch.Success && firstWordMatch.Length == userInput.Length;
}
The other persons have wrote how to resolve the problem you know. Now I'll speak about the problem you perhaps don't know: diacritics :-) Your solution doesn't support àèéìòù and many other letters. A correct solution would be:
^(\p{L}\p{M}*)+$
where \p{L} is any letter plus \p{M}* that is 0 or more diacritic marks (in unicode diacritics can be "separated" from base letters, so you can have something like a + ` = à or you can have precomposed characters like the standard à)
if you just need the characters a-zA-Z you could simply iterate over the characters and compare the single characters if they are inside your range
for example:
for each character c: ('a' <= c && c <= 'z') || ('A' <= c && c <= 'Z')
This could increase your performance

regex for capturing digits and digit ranges

i have the following string
Fat mass loss was 2121,323.222 greater for GPLC (2–2.4kg vs. 0.5kg)
i want to capture
212,323.222
2-2.24
0.5
i.e. i want the above three results from the string,
can any one help me with this regex
I noticed that your hyphen in 2–2.4kg is not really hyphen, its a unicode 0x2013 "DASH".
So, here is another regex in C#
#"[0-9]+([,.\u2013-][0-9]+)*"
Test
MatchCollection matches = Regex.Matches("Fat mass loss was 2121,323.222 greater for GPLC (2–2.4kg vs. 0.5kg)", #"[0-9]+([,.\u2013-][0-9]+)*");
foreach (Match m in matches) {
Console.WriteLine(m.Groups[0]);
}
Here is the results, my console does not support printing unicode char 2013, so its "?" but its properly matched.
2121,323.222
2?2.4
0.5
Okay I didn't notice the C# tag until now. I will leave the answer but I know that's not what you expected, see if you can do something with it. Perhaps the title should have mentioned the programming language?
Sure:
Fat mass loss was (.*) greater for GPLC \((.*) vs. (.*)kg\)
Find your substrings in \1, \2 and \3.
If for Emacs, swap all parentheses and escaped parentheses.
How about something like this:
^.*((?:\d+,)*\d+(?:\.\d+)?).*(\d+(?:\.\d+)?(?:-\d+(?:\.\d+))?).*(\d+(?:\.\d+)).*$
A little more general, I think. I'm a little concerned about .* being greedy.
Fat mass loss was 2121,323.222 greater
for GPLC (2–2.4kg vs. 0.5kg)
a generalized extractor:
/\D+?([\d\,\.\-]+)/g
explanation:
/ # start pattern
\D+ # 1 or more non-digits
( # capture group 1
[\d,.-]+ # character class, 1 or more of digits, comma, period, hyphen
) # end capture group 1
/g # trailing regex g modifier (make regex continue after last match)
sorry I don't know c# well enough for a full writeup, but the pattern should plug right in.
see: http://www.radsoftware.com.au/articles/regexsyntaxadvanced.aspx for some implementation examples.
I came out with something like this atrocity:
-?\d(?:,?\d)*(?:\.(?:\d(?:,?\d)*\d|\d))?(?:[–-]-?\d(?:,?\d)*(?:\.(?:\d(?:,?\d)*\d|\d))?)?
Out of witch -?\d(?:,?\d)*(?:\.(?:\d(?:,?\d)*\d|\d))? is repeated twice, with – in the middle (note that this is a long hyphen).
This should take care of dots and commas outside of numbers, eg: hello,23,45.2-7world - will capture 23,45.2-7.
It looks like you're trying to find all numbers in the string (possibly with commas inside the number), and all ranges of numbers such as "2-2.4". Here is a regex that should work:
\d+(?:[,.-]\d+)*
From C# 3, you can use it like this:
var input = "Fat mass loss was 2121,323.222 greater for GPLC (2-2.4kg vs. 0.5kg)";
var pattern = #"\d+(?:[,.-]\d+)*";
var matches = Regex.Matches(input, pattern);
foreach ( var match in matches )
Console.WriteLine(match.Value);
Hmm, this is a tricky question, especially because the input string contains unicode character – (EN DASH) instead of - (HYPHEN-MINUS). Therefore the correct regex to match the numbers in the original string would be:
\d+(?:[\u2013,.]\d+)*
If you want a more generic approach would be:
\d+(?:[\p{Pd}\p{Pc}\p{Po}]\d+)*
which matches dash punctuation, connecter punctuation and other punctuation. See here for more information about those.
An implementation in C# would look like this:
string input = "Fat mass loss was 2121,323.222 greater for GPLC (2–2.4kg vs. 0.5kg)";
try {
Regex rx = new Regex(#"\d+(?:[\p{Pd}\p{Pc}\p{Po}\p{C}]\d+)*", RegexOptions.IgnoreCase | RegexOptions.Multiline);
Match match = rx.Match(input);
while (match.Success) {
// matched text: match.Value
// match start: match.Index
// match length: match.Length
match = match.NextMatch();
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
Let's try this one :
(?=\d)([0-9,.-]+)(?<=\d)
It captures all expressions containing only :
"[0-9,.-]" characters,
must start with a digit "(?=\d)",
must finish with a digit "(?<=\d)"
It works with a single digit expression and does not include beginning or trailing [.,-].
Hope this helps.
I got the solution to my problem.
The following is the Regex that gave my desired result:
(([0-9]+)([–.,-]*))+

Categories