Regular Expression: single word - c#

I want to check in a C# program, if a user input is a single word. The word my only have characters A-Z and a-z. No spaces or other characters.
I try [A-Za-z]* , but this doesn't work. What is wrong with this expression?
Regex regex = new Regex("[A-Za-z]*");
if (!regex.IsMatch(userinput);)
{
...
}
Can you recomend website with a comprensiv list of regex examples?!

It probably works, but you aren't anchoring the regular expression. You need to use ^ and $ to anchor the expression to the beginning and end of the string, respectively:
Regex regex = new Regex("^[A-Za-z]+$");
I've also changed * to + because * will match 0 or more times while + will match 1 or more times.

You should add anchors for start and end of string: ^[A-Za-z]+$

Regarding the question of regex examples have a look at http://regexlib.com/.
For the regex, have a look at the special characters ^ and $, which represent starting and ending of string. This site can come in handy when constructing regexes in the future.

The asterisk character in regex specifies "zero or more of the preceding character class".
This explains why your expression is failing, because it will succeed if the string contains zero or more letters.
What you probably intended was to have one or more letters, in which case you should use the plus sign instead of the asterisk.
Having made that change, now it will fail if you enter a string that doesn't contain any letters, as you intended.
However, this still won't work for you entirely, because it will allow other characters in the string. If you want to restrict it to only letters, and nothing else, then you need to provide the start and end anchors (^ and $) in your regex to make the expression check that the 'one or more letters' is attached to the start and end of the string.
^[a-zA-Z]+$
This should work as intended.
Hope that helps.
For more information on regex, I recommend http://www.regular-expressions.info/reference.html as a good reference site.

I don't know what the C#'s regex syntax is, but try [A-Za-z]+.

Try ^[A-Za-z]+$ If you don't include the ^$ it will match on any part of the string that has a alpha characters in it.

I know the question is only about strictly alphabetic input, but here's an interesting way of solving this which does not break on accented letters and other such special characters.
The regex "^\b.+?\b" will match the first word on the start of a string, but only if the string actually starts with a valid word character. Using that, you can simply check if A) the string matches, and B) the length of the matched string equals your full string's length:
public Boolean IsSingleWord(String userInput)
{
Regex firstWordRegex = new Regex("^\\b.+?\\b");
Match firstWordMatch = firstWordRegex.Match(userInput);
return firstWordMatch.Success && firstWordMatch.Length == userInput.Length;
}

The other persons have wrote how to resolve the problem you know. Now I'll speak about the problem you perhaps don't know: diacritics :-) Your solution doesn't support àèéìòù and many other letters. A correct solution would be:
^(\p{L}\p{M}*)+$
where \p{L} is any letter plus \p{M}* that is 0 or more diacritic marks (in unicode diacritics can be "separated" from base letters, so you can have something like a + ` = à or you can have precomposed characters like the standard à)

if you just need the characters a-zA-Z you could simply iterate over the characters and compare the single characters if they are inside your range
for example:
for each character c: ('a' <= c && c <= 'z') || ('A' <= c && c <= 'Z')
This could increase your performance

Related

C# Regular Expression for String matching

I am looking for a regular expression that returns success only if the input string contains following characters:
a-zA-Z0-9~!#$^ ()_-+’:.?
Is this regular expression correct?
^[a-zA-Z0-9~!#$^ ()_-+’:.?]+$
I have understood what ^ means here but not sure about +$. Also are there any alternatives to this? By the way the above regular expression also includes a space character between ^ and (
it only contains the characters listed above
bool invalidCharsExist =
Regex.Replace(input, #"[a-zA-Z0-9~!#\$\^\ \(\)_\-\+’:\.\?]", "").Length != 0;
BTW: This is not fully equivalent to your regex (It will also include non-ascii letters and digits) but I think it is a better way to check
var specialChars = new HashSet<char>("~!#$^ ()_-+’:.?");
var allValid = input.All(c => char.IsLetterOrDigit(c) || specialChars.Contains(c));
Close, but get rid of that dash in the middle of your character class and put it at the beginning:
^[-a-zA-Z0-9~!#$^ ()_+’:.?]+$
And make sure when you put it in a string that you use the proper string qualifier (I forget what it's called):
#"^[-a-zA-Z0-9~!#$^ ()_+’:.?]+$"
As to whether or not you can do it in other ways, sure, for example a negative look-ahead that doesn't actually match anything. I don't think a proper regex optimizer would leave one better than the other, it's just a matter of preference. Do you want something that looks to succeed (selects the entire string if valid), or something that looks to fail (negative look-ahead).
Honestly if performance is at all important, you should write a good old for and loop over the characters (or the equivalent LINQ implementation). Regex won't even be in the ballpark.
the regular expression would be: ^[a-zA-Z0-9~!#$^ ()_\-+’:.?]+$
I personally recommend using https://regex101.com to check regex expressions - note that they don't have C# support, but in general javascript's RegExp has similar syntax to C#, but what it does give you a particularly useful explaination of what your expression is doing, here is this epression's explaination from there:
^ assert position at start of the string
[a-zA-Z0-9~!#$^ ()_\-\+’:.?]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
a-z a single character in the range between a and z (case sensitive)
A-Z a single character in the range between A and Z (case sensitive)
0-9 a single character in the range between 0 and 9
~!#$^ ()_ a single character in the list ~!#$^ ()_ literally
\- matches the character - literally
+’:.? a single character in the list ’:.? literally
$ assert position at end of the string
the issue with what you put in the OP was literally only forgetting to escape the - as it is reserved in the regular expression pattern to be used for special purposes (i.e in the [] notation the - is reserved to declare a character range like a-z)

Why is this regex not allowing this text?

I have a username validator IsValidUsername, and I am testing "baconman" but it is failing, could someone please help me out with this regex?
if(!Regex.IsMatch(str, #"^[a-zA-Z]\\w+|[0-9][0-9_]*[a-zA-Z]+\\w*$")) {
isValid = false;
}
I want the restrictions to be: (It's very close)
Be between 5 & 17 characters long
contain at least one letter
no spaces
no special characters
You're escaping unnecessarily: if you write your regex as starting with # outside the string, you don't need both \ - just one is fine.
Either:
#"\w"
or
"\\w"
Edit: I didn't make this clear: right now due to the double escaping, you're looking for a \ in your regex and a w. So your match would need [some character]\w to match (example: "a\w" or "a\wwwwww" would match.
Your requirements are best taken care of in normal C#. They don't map well to a regular expression. Just code them up using LINQ which works on strings like it would on an IEnumerable<char>.
Also, understanding a query of a string is much easier than understanding a Regex with the requirements that you have.
It is possible to do everything as part of a Regex, however it is not pretty :-)
^(\w(?=\w*[a-zA-Z])|[a-zA-Z]|\w(?<=[a-zA-Z]\w*)){5,17}$
It does 3 checks that always results in 1 character being matched (so we can perform the length check in the end)
Either the character is any word character \w which is before [a-zA-Z]
Or it is [a-zA-Z]
Or it is any word character \w which is after [a-zA-Z]

regex issue c# numbers are underscores now

My Regex is removing all numeric (0-9) in my string.
I don't get why all numbers are replaced by _
EDIT: I understand that my "_" regex pattern changes the characters into underscores. But not why numbers!
Can anyone help me out? I only need to remove like all special characters.
See regex here:
string symbolPattern = "[!##$%^&*()-=+`~{}'|]";
Regex.Replace("input here 12341234" , symbolPattern, "_");
Output: "input here ________"
The problem is your pattern uses a dash in the middle, which acts as a range of the ascii characters from ) to =. Here's a breakdown:
): 41
1: 49
=: 61
As you can see, numbers start at 49, and falls between the range of 41-61, so they're matched and replaced.
You need to place the - at either the beginning or end of the character class for it to be matched literally rather than act as a range:
"[-!##$%^&*()=+`~{}'|]"
you must escape - because sequence [)-=] contains digits
string symbolPattern = "[!##$%^&*()\-=+`~{}'|]";
Move the - to the end of the list so it is seen as a literal:
"[!##$%^&*()=+`~{}'|-]"
Or, to the front:
"[-!##$%^&*()=+`~{}'|]"
As it stands, it will match all characters in the range )-=, which includes all numerals.
You need to escape your special characters in your regex. For instance, * is a wildcard match. Look at what some of those special characters mean for your match.
I've not used C#, but typically the "*" character is also a control character that would need escaping.
The following matches a whole line of any characters, although the "^" and "$" are some what redundant:
^.*$
This matches any number of "A" characters that appear in a string:
A*
The "Owl" book from oreilly is what you really need to research this:
http://shop.oreilly.com/product/9780596528126.do?green=B5B9A1A7-B828-5E41-9D38-70AF661901B8&intcmp=af-mybuy-9780596528126.IP

Regular expressions to extract punctuations/characters

Hi guys I need a regex that only extracts punctuations/characters.
I have this so far :
[._^%$#!~#,-]+
but this works if there is at least 1 punctuations and still allows for any other char (digit or letter)
I need to to only allow punctuations/characters
Try this:
^[\p{S}\p{P}]+$
\p{S} matches any symbol character, and \p{P} matches any punctuation.
Note that your pattern will not match all the symbols and punctuations not present in the list.
Try anchoring the regex to the start and the end of the string (unless you're using Multiline matching) - i.e. ^ at the beginning and $ at the end:
^[._^%$#!~#,-]+$
Note - this does not endorse your actual pattern (I can't say whether this is matching all the 'special characters' you're talking about, but it will make it so that the entire string must be all 'special'.
[^a-zA-Z0-9]* u can try something like this. Should NOT accept those chars, cba to writte all beside chars beside one u typed.
\W
Matches any character that is not a word character (alphanumeric & underscore).

Regex for my password validation

I need a regular expression for my password format. It must ensure that password only contains letters a-z, digits 0-9 and special characters: .##$%&.
I am using .NET C# programming language.
This is my code:
Regex userAndPassPattern = new Regex("^[a-z0-9.##$%&]$");
if (!userAndPassPattern.IsMatch(username) || !userAndPassPattern.IsMatch(password))
return false;
The problem is that I always get back false.
!A || !B is logically equivalent to !(A && B)
So you could write better
!(userAndPassPattern.IsMatch(username) && userAndPassPattern.IsMatch(password))
Then you have a special character $ in you character class, maybe you need to mask it \$
I'm not quite sure about this, because in a character class it is not a special character. Maybe it depends on the RegEx engine in use. If you mask the $ it should do no harm ([a-z0-9.##\$%&])
Then you have just a single character to match. You need a quantifier
[a-z0-9.##$%&] means one single character out of the given, will match aor b or 0 but not ab
[a-z0-9.##$%&]+ many characters out of the given, from 1 to endless appearances, will match a, b, and ab and ba etc.
edit
This is what you want
Regex userAndPassPattern = new Regex("^[a-z0-9\.##\$%&]+$");
if (!(userAndPassPattern.IsMatch(username) && userAndPassPattern.IsMatch(password))) {
return false;
}
You forgot to add the '+' for matching one or more times:
Regex userAndPassPattern = new Regex("^[a-z0-9.##$%&]+$");
This code Regex userAndPassPattern = new Regex("^[a-z0-9.##$%&]$"); will only match a username or password that is a single character long.
You are looking for something like this Regex userAndPassPattern = new Regex("^[a-z0-9.##$%&]+$"); which will match one or more of the characters in your class. The + symbol tells it to match one or more of the previous atom (which in this case is the character class you specified in the square brackets)
Also, if you did not mean to constrain the match to lowercase characters, you should add 'A-Z' to the character class Regex userAndPassPattern = new Regex("^[A-Za-z0-9.##$%&]$");
You might also want to implement a minimum length restriction which can be accomplished by replacing the + with the {n,} construct, where n is the minimum length you want to match. For example:
this would match a minimum of 6 characters
Regex userAndPassPattern = new Regex("^[a-z0-9.##$%&]{6,}$");
this would match a minimum of 6 and a maximum of 12
Regex userAndPassPattern = new Regex("^[a-z0-9.##$%&]{6,12}$");
You have two problems. First . and $ need to be escaped. Second you are matching only 1 character. Add a + before the last $:
^[a-z0-9\.##\$%&]+$
Edit: Another suggestion, if you have a minimum/maximum length you can replace the + with, for example, {6,16} or whatever you think is appropriate. This will match strings that are 6 to 16 character inclusive and reject any shorter or longer strings. If you don't care about an upper limit, you could use {6,}.
Have you tried using a verbatim string literal when you're using regex escape sequences?
Regex userAndPassPattern = new Regex(#"^[a-z0-9##\$%&]+$");
if (!userAndPassPattern.IsMatch(username) || !userAndPassPattern.IsMatch(password))
return false;
Your pattern only allows a single character set you probably want a repetition operator like * + or {10,}.
Your character set includes . which matches any character, defeating the object of the character class. If you wanted to match "." then you need to escape it with \.

Categories