Regular Expression in .NET - c#

I should write a regex pattern in c# that checks for input string whether it conains certain characters and does not conain another characters, for example:
I want that the string contain only a-z, not contain (d,b) and the length of all the string longer than 5, I write "[a-z]{5,}", how can I avoid that the input contain d and b?
Additional question: Can I have option to condition in the regex, in other words if whichever boolian var equals true check somthing and if it equals false not check it?
Thanks

simple regex:
/[ace-z]{5}/
matches five occurrences of: characters 'a', 'c', or any character between 'e' and 'z'.

A useful regex resource I always use is:
http://regexlib.com/
Helped me out many times.

For the first question, why not simply try this: [ace-z]{5,} ?
For the second option, can't you format the regex string in some way based on the boolean variable before executing it ?
Or, if you want programmatically exclude some chars, you can create programmatically the regex by expliciting all the chars [abcdefgh....] without the exclusion.

if you want to skip d and b
[ace-z]{5,}
And yes you can have a boolean check using isMatch method of Regex class
Regex regex = new Regex("^[ace-z]{5,}$");
if (regex.IsMatch(textBox1.Text))
{
errorProvider1.SetError(textBox1, String.Empty);
}
else
{
errorProvider1.SetError(textBox1,
"Invalid entry");
}
Source

Related

Regex to allow some special characters c#

I have to check whether a string contains special characters or not but I can allow these 5 special characters in it .()_-
i have written my regex as
var specialCharacterSet = "^[()_-.]";
var test = Regex.Match("a!", specialCharacterSet);
var isValid = test.Success;
but its throwing an:
error parsing "^[()_-.]" - [x-y] range in reverse order.
You have specified a range with -. Place it at the end:
[()_.-]
Otherwise the range is not correct: the lower boundary symbol _ appears later in the character table than the upper bound symbol .:
Also, if you plan to check if any of the character inside a string belongs to this set, you should remove ^ that checks only at the beginning of a string.
To test if a string meets some pattern, use Regex.IsMatch:
Indicates whether the regular expression finds a match in the input string.
var specialCharacterSet = "[()_.-]";
var test = Regex.IsMatch("a!", specialCharacterSet);
UPDATE
To accept any string value that doesnt contains the five characters, you can use
var str = "file.na*me";
if (!Regex.IsMatch(str, #"[()_.-]"))
Console.WriteLine(string.Format("{0}: Valid!", str));
else
Console.WriteLine(string.Format("{0}: Invalid!", str));
See IDEONE demo
You can use ^[()_\-.] or ^[()_.-] if you use special characters then best use \ before any special characters (which are used in regex special char.).
[()_.-]
Keep - at end or escape it to avoid it forming an invalid range.- inside a character class forms a range.Here
_ is decimal 95
. is decimal 46.
So it is forming an invalid range from 95 to 46
var specialCharacterSet = "^[()_.-]";
var test = Regex.IsMatch("a!", specialCharacterSet);
Console.WriteLine(test);
Console.ReadLine();
Convert all special characters in a pattern to text using Regex.Escape(). Suppose you already have using System.Text.RegularExpressions;
string pattern = Regex.Escape("[");
then check like this
if (Regex.IsMatch("ab[c", pattern)) Console.WriteLine("found");
Microsoft doesn't tell about escape in the tutorial. I learned it from Perl.
The best way in terms of C# is [()_\-\.], because . and - are reserved characters for regex. You need to use an escape character before these reserved characters.

C# regex match behaviour

I've got this line in my code:
Match match = Regex.Match(actualValue, regexValue, RegexOptions.None);
I've got a simple question. why when checking for success meaning with the line:
if(match.Success)
then the match does succeed with the following values:
actualValue = "G:1"
regexValue = "A*"
the actual does not seem to fit at least for me so i probably miss something...
what i do want to achieve is just receiving an actual value and a regular expression and check if the actual value fits the regular expression.. i thought that's what i did there but apparently i didn't.
EDIT: another question. is there a way to treat the * as the "any char" wildcard? meaning is it possible that A* will be considered as A and after it any char is possible?
Your code itself is correct; your regular expression isn't.
Based on your comments on other answers, you're after a regular expression which matches any string which starts with A, and you're assuming that '*' means "any characters". '*' in fact means "match the preceding character zero or more times", so the regular expression you've given means "match the start of the string followed by zero or more 'A' characters", which will match absolutely anything.
If you're looking for a regular expression that matches the whole string but only if it starts with 'A', the regular expression you're after is ^A.*. The '.' character in a regular expression means "match any character". This regular expression thus means "match the start of the string, followed by an 'A', followed by zero or more other characters" and will thus match the entire string provided it starts with 'A'.
However, you already have the whole string, so this is a little unnecessary - all you really want to do is get an answer to the question "does the string start with an 'A'?". A regular expression that will achieve this is simply '^A'. If it matches, the string started with an 'A'.
Of course, it should be pointed out that you don't need a regular expression to confirm this anyway. If this is genuinely all you want to do (and it's possible you've just put together a simple example, and your real scenario is more complicated), why not just use the StartsWith method?:
bool match = actualValue.StartsWith("A");
The regex matches because A* means "look for 0 or more occurrences of 'A'". It will match any string.
If you meant to look for an arbitrary number of 'A', but at least one, try A+ instead.
Looking at the comments it looks like you're trying to match a lot of strings starting with A.
If they're separated by white space you could find all of them using the following:
bool matched = Regex.IsMatch(actualValue, #"\bA\w+");
This matches : "Atest flkjs Apple Ascii cAse".
If there is only one string you're matching and it starts with A and has no spaces:
bool matched = Regex.IsMatch(actualValue, #"^A\w+$");
This matches "Apple", but not "Apple and orange" as the second string has spaces.
As Chris noted * is not a wildcard in the way you meant with regex searches. You can find some information to get you started with regexes at regex-info.
Regex take the regular expression in the constructor.
Exampel in your case could be :
if(new Regex("A*").IsMatch(actualValue)
//Do something
If you are unsecure of the regexpattern, try it out here

Nested quantifier workaround?

I have an issue that I can't seem to find a solution for it. I'm using Regex.IsMatch to check if an input matches what's expected. If an input contains **, ++, it complains. For example, I would like to save "Message **" as an accepted value, but I keep getting an ArgumentException saying: "Nested quantifier *" whenever I try to call Regex.IsMatch on it. Is there any way to workaround this?
public bool ResponseMatch(string responseText)
{
return Regex.IsMatch(responseText, regexPatternString);
}
It sounds like you're trying to match Message ** as a literal value? In this case, call Regex.Escape:
Regex.Escape("Message **") == "Message\ \*\*"
Then you can use it like this:
var valueToMatch = "Message **";
var matches = Regex.IsMatch(input, Regex.Escape(valueToMatch));
However, if you're just using literal values and not any regex features, you might be better off using string.Contains.
"Message *" as a regex means "match any string that has the characters m e s s a g e in that order, then match zero or more spaces".
Regex.IsMatch takes two inputs - the regex to see if it is matched, and the string to run the regex on. Seems you've got the two confused.
If you're trying to legitmately use "Message **" as a regex, you probably mean to escape the "**". If you only need to escape it in handpicked strings, then fix the string to be, say, #"Message \*\*". If you need to fix any number of regex inputs, then run the Regex.Escape over the string first. #"Message \*\*" == Regex.Escape("Message **")

Regular Expression: single word

I want to check in a C# program, if a user input is a single word. The word my only have characters A-Z and a-z. No spaces or other characters.
I try [A-Za-z]* , but this doesn't work. What is wrong with this expression?
Regex regex = new Regex("[A-Za-z]*");
if (!regex.IsMatch(userinput);)
{
...
}
Can you recomend website with a comprensiv list of regex examples?!
It probably works, but you aren't anchoring the regular expression. You need to use ^ and $ to anchor the expression to the beginning and end of the string, respectively:
Regex regex = new Regex("^[A-Za-z]+$");
I've also changed * to + because * will match 0 or more times while + will match 1 or more times.
You should add anchors for start and end of string: ^[A-Za-z]+$
Regarding the question of regex examples have a look at http://regexlib.com/.
For the regex, have a look at the special characters ^ and $, which represent starting and ending of string. This site can come in handy when constructing regexes in the future.
The asterisk character in regex specifies "zero or more of the preceding character class".
This explains why your expression is failing, because it will succeed if the string contains zero or more letters.
What you probably intended was to have one or more letters, in which case you should use the plus sign instead of the asterisk.
Having made that change, now it will fail if you enter a string that doesn't contain any letters, as you intended.
However, this still won't work for you entirely, because it will allow other characters in the string. If you want to restrict it to only letters, and nothing else, then you need to provide the start and end anchors (^ and $) in your regex to make the expression check that the 'one or more letters' is attached to the start and end of the string.
^[a-zA-Z]+$
This should work as intended.
Hope that helps.
For more information on regex, I recommend http://www.regular-expressions.info/reference.html as a good reference site.
I don't know what the C#'s regex syntax is, but try [A-Za-z]+.
Try ^[A-Za-z]+$ If you don't include the ^$ it will match on any part of the string that has a alpha characters in it.
I know the question is only about strictly alphabetic input, but here's an interesting way of solving this which does not break on accented letters and other such special characters.
The regex "^\b.+?\b" will match the first word on the start of a string, but only if the string actually starts with a valid word character. Using that, you can simply check if A) the string matches, and B) the length of the matched string equals your full string's length:
public Boolean IsSingleWord(String userInput)
{
Regex firstWordRegex = new Regex("^\\b.+?\\b");
Match firstWordMatch = firstWordRegex.Match(userInput);
return firstWordMatch.Success && firstWordMatch.Length == userInput.Length;
}
The other persons have wrote how to resolve the problem you know. Now I'll speak about the problem you perhaps don't know: diacritics :-) Your solution doesn't support àèéìòù and many other letters. A correct solution would be:
^(\p{L}\p{M}*)+$
where \p{L} is any letter plus \p{M}* that is 0 or more diacritic marks (in unicode diacritics can be "separated" from base letters, so you can have something like a + ` = à or you can have precomposed characters like the standard à)
if you just need the characters a-zA-Z you could simply iterate over the characters and compare the single characters if they are inside your range
for example:
for each character c: ('a' <= c && c <= 'z') || ('A' <= c && c <= 'Z')
This could increase your performance

Regular expression for numbers in string

The input string "134.45sdfsf" passed to the following statement
System.Text.RegularExpressions.Regex.Match(input, pattern).Success;
returns true for following patterns.
pattern = "[0-9]+"
pattern = "\\d+"
Q1) I am like, what the hell! I am specifying only digits, and not special characters or alphabets. So what is wrong with my pattern, if I were to get false returned value with the above code statement.
Q2) Once I get the right pattern to match just the digits, how do I extract all the numbers in a string?
Lets say for now I just want to get the integers in a string in the format "int.int^int" (for example, "11111.222^3333", In this case, I want extract the strings "11111", "222" and "3333").
Any idea?
Thanks
You are specifying that it contains at least one digit anywhere, not they are all digits. You are looking for the expression ^\d+$. The ^ and $ denote the start and end of the string, respectively. You can read up more on that here.
Use Regex.Split to split by any non-digit strings. For example:
string input = "123&$456";
var isAllDigit = Regex.IsMatch(input, #"^\d+$");
var numbers = Regex.Split(input, #"[^\d]+");
it says that it has found it.
if you want the whole expression to be checked so :
^[0-9]+$
Q1) Both patterns are correct.
Q2) Assuming you are looking for a number pattern "5 digits-dot-3 digits-^-4 digits" - here is what your looking for:
var regex = new Regex("(?<first>[0-9]{5})\.(?<second>[0-9]{3})\^(?<third>[0-9]{4})");
var match = regex.Match("11111.222^3333");
Debug.Print(match.Groups["first"].ToString());
Debug.Print(match.Groups["second"].ToString
Debug.Print(match.Groups["third"].ToString
I prefer named capture groups - they will give a more clear way to acces than

Categories