DataAnnotation RegularExpression problems - c#

Why RegularExpressionAttribute validation doesn't compare the input string with the value of all matches concatenated?
I asked a question here about the scenario below, but I found a solution the next day and found it better to raise the issue here.
[Required(
AllowEmptyStrings = false,
ErrorMessage = "Required")]
[RegularExpression(
"^[^0]{1}|..+",
ErrorMessage = "Expressao Regular")]
public string EncryptedValue { get; set; }
Except by empty string or "0", the property should be valid in ModelState, but:
You can test HERE the expression and value.
Expression
^[^0]{1}|..+
Value
+iCMEBYZQtWbnU2RPX/MmqrDPuVJzSGGWhkFd+9/zpMbHVoOlZFuF9ND1xAxsQy3YFCPIsUBEgg2RJNkPefrmQ==
You will notice that the expression match, but with two matches. The first match itself are not equals to the input string, you need to concatenate both match value to reach that.
But apparently this is not done in the validation of ModelState, even with jquery.validate.unobtrusive this happens (with jquery, I need to click in submit button two times to see this, but it's happens).
Solution
You need to build an expression that match input string completelly in the first match.
When you build a expression to validate a field, every OR in your expression must match all the input string.
So whenever you mount an expression with OR operators, always mount from largest input to smallest input.
In this case:
From ^[^0]{1}|..+ to ..+|^[^0]{1}

let's take a look at class System.ComponentModel.DataAnnotations.RegularExpressionAttribute . we are interested in the following method :
[__DynamicallyInvokable]
public override bool IsValid(object value)
{
this.SetupRegex();
string input = Convert.ToString(value, (IFormatProvider) CultureInfo.CurrentCulture);
if (string.IsNullOrEmpty(input))
return true;
Match match = this.Regex.Match(input);
if (match.Success && match.Index == 0)
return match.Length == input.Length;
return false;
}
In our case we have regex expression "^[^0]{1}|..+" and input string +iCMEBYZQtWbnU2RPX/MmqrDPuVJzSGGWhkFd+9/zpMbHVoOlZFuF9ND1xAxsQy3YFCPIsUBEgg2RJNkPefrmQ== . Regex validation return two matches, the first + (first symbol) and the second is the rest part. the first match length less then input string, that is why IsValid return false.
Why RegularExpressionAttribute validation doesn't compare the input string with the value of all matches concatenated? because it works with the first match

So, if your regex is ^[^0]{1}|..+
And you value is : +iCMEBYZQt...
One match it will be +, and the other: iCMEBY0ZQt....
The thing is, in the first part ^[^0]{1}, your regex try to match any character that is not 0, once (at the beginning of the line/string).
So, the regex looks the string and say something like:
Voilah!! I have something that is not zero, at the beginning and is one character, the character is : +, Thanks, thanks, where is my cookie?!
Also, the or instruction (|) tells the regex to be more flexible about the patterns and is something like, If you don't find the first expression, don't feel bad, you also can look for this another thing. But once the regex has the first cookie doesn't make sense to go back and try to look again for the second expression (..+) because already found a solution for the first expression, and the regex knows that there will not be extra cookies for the same work. So, life continues and we have to move on.
Here you can see that the first and the second expressions are independent ways to satisfy the regex.
https://regex101.com/r/UWtcF8/3
Your second regex is : ..+|^[^0]{1}:
https://regex101.com/r/la3wDa/1
Where the order of the expressions are swapped. This is the same but the first expression that the regex will try to satisfy is ..+ which basically is give anything that involves 2 or more characters. So for example if you have 00, the regex will say:
Yumm, cookies !
And that can be fine or not, it will depend on what you want to solve.
Now another option is:
^(?!0$).+
https://regex101.com/r/gxJdch/1
Where we are going to look for anything but a single zero.
Useful and fun links:
https://regex101.com/ < Even when you are not working on php you can select the php version and check regex debugger to learn more things about regex!)
https://regexper.com/

Related

Regex group matching without word boundaries

I am trying to create a function that returns true when the string doesn't have a particular group of chars (in this example the group is "DontMatchMe")
so, of the following examples:
example1
examDontMatchMeple2
example3
examDontMatchMeple4
example4
valid matches are:
example1
example3
example4
my first option was to use the pattern .*(?!DontMatchMe).* but .* is consuming everything, the match is always true.
Note that the values on the string I am actually using are random. I cannot use "exe" to build the regex, for example. the "DontMatchMe" is also random.
In order to exclude a specific word, you can use a pattern like this: ^(?!.*DontMatchMe).+
To avoid the issue with .* consuming everything you can anchor the pattern to the beginning of the string. The pattern break-down is as follows:
^: anchor to the beginning of the string
(?!.*DontMatchMe): negative look-ahead that matches any character and the text to be ignored
.+: finally, match one or more characters (which would happen as long as the look-ahead didn't match anything)
Example:
string[] inputs =
{
"example1",
"examDontMatchMeple2",
"example3",
"examDontMatchMeple4",
"example4"
};
string ignoreText = "DontMatchMe";
string pattern = String.Format("^(?!.*{0}).+", Regex.Escape(ignoreText));
foreach (var input in inputs)
{
Console.WriteLine("{0}: {1}", input, Regex.IsMatch(input, pattern));
}
If it's a simple non-regex string that you want to check for, you can simply use the Contains method and invert the result:
bool doesNotContain(string s, string group) {
// error check for nulls first (not included here)
return !s.Contains(group);
}
If you want your group to possibly be a regex, you can still use the same principle. Look for the pattern you don't want, and if it's there return false, otherwise return true. This is probably easier to read and understand, particularly for people not familiar with the more advanced concepts of regular expressions, like negative lookaheads.

C# regex match behaviour

I've got this line in my code:
Match match = Regex.Match(actualValue, regexValue, RegexOptions.None);
I've got a simple question. why when checking for success meaning with the line:
if(match.Success)
then the match does succeed with the following values:
actualValue = "G:1"
regexValue = "A*"
the actual does not seem to fit at least for me so i probably miss something...
what i do want to achieve is just receiving an actual value and a regular expression and check if the actual value fits the regular expression.. i thought that's what i did there but apparently i didn't.
EDIT: another question. is there a way to treat the * as the "any char" wildcard? meaning is it possible that A* will be considered as A and after it any char is possible?
Your code itself is correct; your regular expression isn't.
Based on your comments on other answers, you're after a regular expression which matches any string which starts with A, and you're assuming that '*' means "any characters". '*' in fact means "match the preceding character zero or more times", so the regular expression you've given means "match the start of the string followed by zero or more 'A' characters", which will match absolutely anything.
If you're looking for a regular expression that matches the whole string but only if it starts with 'A', the regular expression you're after is ^A.*. The '.' character in a regular expression means "match any character". This regular expression thus means "match the start of the string, followed by an 'A', followed by zero or more other characters" and will thus match the entire string provided it starts with 'A'.
However, you already have the whole string, so this is a little unnecessary - all you really want to do is get an answer to the question "does the string start with an 'A'?". A regular expression that will achieve this is simply '^A'. If it matches, the string started with an 'A'.
Of course, it should be pointed out that you don't need a regular expression to confirm this anyway. If this is genuinely all you want to do (and it's possible you've just put together a simple example, and your real scenario is more complicated), why not just use the StartsWith method?:
bool match = actualValue.StartsWith("A");
The regex matches because A* means "look for 0 or more occurrences of 'A'". It will match any string.
If you meant to look for an arbitrary number of 'A', but at least one, try A+ instead.
Looking at the comments it looks like you're trying to match a lot of strings starting with A.
If they're separated by white space you could find all of them using the following:
bool matched = Regex.IsMatch(actualValue, #"\bA\w+");
This matches : "Atest flkjs Apple Ascii cAse".
If there is only one string you're matching and it starts with A and has no spaces:
bool matched = Regex.IsMatch(actualValue, #"^A\w+$");
This matches "Apple", but not "Apple and orange" as the second string has spaces.
As Chris noted * is not a wildcard in the way you meant with regex searches. You can find some information to get you started with regexes at regex-info.
Regex take the regular expression in the constructor.
Exampel in your case could be :
if(new Regex("A*").IsMatch(actualValue)
//Do something
If you are unsecure of the regexpattern, try it out here

how to create regular expression based on some condition

i want to create a regular expression to find and replace uppercase character based on some condition.
find the starting uppercase for a group of uppercase character in a string and replace it lowercase and * before the starting uppercase.
If there is any lowercase following the uppercase,replace the uppercase with lowercase and * before the starting uppercase.
input string : stackOVERFlow
expected output : stack*over*flow
i tried but could not get it working perfectly.
Any idea on how to create a regular expression ?
Thanks
Well the expected inputs and outputs are slightly illogical: you're lower-casing the "f" in "flow" but not including it in the asterisk.
Anyway, the regex you want is pretty simple: #"[A-Z]+?". This matches a string of one or more uppercase alpha characters, nongreedily (don't think it makes a difference either way as the matched character class is relatively narrow).
Now, to do the find/replace, you would do something like the following:
Regex.Replace(inputString, #"([A-Z]+?)", "*$1*").ToLower();
This simply finds all occurrences of one or more uppercase alpha characters, and wherever it finds a match it replaces it with itself surrounded by asterisks. This does the surrounding but not the lowercasing; .NET Regex doesn't provide for that kind of string modification. However, since the end result of the operation should be a string with all lowercase chars, just do exactly that with a ToLower() and you'll get the expected result.
KeithS's solution can be simplified a bit
Regex.Replace("stackOVERFlow","[A-Z]+","*$0*").ToLower()
However, this will yield stack*overf*low including the f between the stars. If you want to exclude the last upper case letter, use the following expression
Regex.Replace("stackOVERFlow","[A-Z]+(?=[A-Z])","*$0*").ToLower()
It will yield stack*over*flow
This uses the pattern find(?=suffix), which finds a position before a suffix.

Match pattern of [0-9]-[0-9]-[0-9], but without matching [0-9]-[0-9]

I'm not sure how to accomplish this with a regular expression (or if I can; I'm new to regex). I have an angle value the user will type in and I'm trying to validate the entry. It is in the form degrees-minutes-seconds. The problem I'm having, is that if the user mistypes the seconds portion, I have to catch that error, but my match for degrees-minutes is a success.
Perhaps the method will explain better:
private Boolean isTextValid(String _angleValue) {
Regex _degreeMatchPattern = new Regex("0*[1-9]");
Regex degreeMinMatchPattern = new Regex("(0*[0-9]-{1}0*[0-9]){1}");
Regex degreeMinSecMatchPattern = new Regex("0*[0-9]-{1}0*[0-9]-{1}0*[0-9]");
Match _degreeMatch, _degreeMinMatch, _degreeMinSecMatch;
_degreeMinSecMatch = degreeMinSecMatchPattern.Match(_angleValue);
if (_degreeMinSecMatch.Success)
return true;
_degreeMinMatch = degreeMinMatchPattern.Match(_angleValue);
if (_degreeMinMatch.Success)
return true;
_degreeMatch = _degreeMatchPattern.Match(_angleValue);
if (_degreeMatch.Success)
return true;
return false;
}
}
I want to check for degrees-minutes if the degrees-minutes-seconds match is unsuccessful, but only if the user didn't enter any seconds data. Can I do this via regex, or do I need to parse the string and evaluate each portion separately? Thanks.
EDIT: Sample data would be 45-23-10 as correct data. The problem is 45-23 is also valid data; the 0 seconds is understood. So if the user types 45-23-1= on accident, the degreeMinMatchPattern regex in my code will match succesfully, even though it is invalid.
Second EDIT: Just to make it clear, the minutes and second portions are both optional. The user can type 45 and that is valid.
You can specify "this part of the pattern must match at least 3 times" using the {m,} syntax. Since there are hyphens between each component, specify the first part separately, and then each hyphen-digit combination can be grouped together after:
`[0-9](-[0-9]){2,}`
You also can shorten [0-9] to \d: \d(-\d){2,}
First off, a character in a regex is matched once by default, so {1} is redundant.
Second, since you can apparently isolate this value (you prompt for just this value, instead of having to look for it in a paragraph of entered data) you should include ^ and $ in your string, to enforce that the string should contain ONLY this pattern.
Try "^\d{1,3}-\d{1,2}(-\d{1,2})?$".
Breaking it down: ^ matches the beginning of the string. \d matches any single decimal character, and then behind that you're specifying {1,3} which will match a set of one to three occurrences of any digit. Then you're looking for one dash, then a similar decimal pattern but only one or two times. The last term is enclosed in parenthesis so we can group the characters. Its form is similar to the first two, then there's a ? which marks the preceding character group as optional. The $ at the end indicates that the input should end. Given this, it will match 222-33-44 or 222-33, but not 222-3344 or 222-33-abc.
Keep in mind there are additional rules you might want to incorporate. For instance, seconds can be expressed as a decimal (if you want a resolution smaller than one second). You would need to optionally expect the decimal point and one or more additional digits. Also, you probably have a maximum degree value; the above regex will match the maximum integer DMS value of 359-59-59, however it will also match 999-99-99 which is not valid. You can limit the maximum value using regex (for example "(3[0-5]\d|[1-2]\d{2}|\d{1,2})" will match any number from 0 to 359, by matching a 3, then 0-5, then 0-9, OR any 3-digit number starting with 1 or 2, OR any two-digit number), but as the example shows the regex will get long and messy, so document it well in code as to what you're doing.
Maybe you would do better to just parse the input out and check is piece separately.
I'm not sure I understand correctly, but I think
(?<degrees>0*[0-9])-?(?<minutes>0*[0-9])(?:-?(?<seconds>0*[0-9]))?$
might work.
But this is quite ambiguous; also I'm wondering why you're only allowing single-digit degree/minute/second values. Please show some examples you do and don't want to match.
Maybe you should to try something like this and test for empty/invalid groups:
Regex degrees = new Regex(
#"(?<degrees>\d+)(?:-(?<minutes>\d+))?(?:-(?<seconds>\d+))?");
string[] samples = new []{ "123", "123-456", "123-456-789" };
foreach (var sample in samples)
{
Match m = degrees.Match(sample);
if(m.Success)
{
string degrees = m.Groups["degrees"].Value;
string minutes = m.Groups["minutes"].Value;
string seconds = m.Groups["seconds"].Value;
Console.WriteLine("{0}°{1}'{2}\"", degrees,
String.IsNullOrEmpty(minutes) ? "0" : minutes,
String.IsNullOrEmpty(seconds) ? "0" : seconds
);
}
}

Does this regex expression allow "*"?

I really know very little about regex's.
I'm trying to test a password validation.
Here's the regex that describes it (I didn't write it, and don't know what it means):
private static string passwordField = "[^A-Za-z0-9_.\\-!##$%^&*()=+;:'\"|~`<>?\\/{}]";
I've tried a password like "dfgbrk*", and my code, using the above regex, allowed it.
Is this consistent with what the regex defines as acceptable, or is it a problem with my code?
Can you give me an example of a string that validation using the above regex isn't suppose to allow?
Added: Here's how the original code uses this regex (and it works there):
public static bool ValidateTextExp(string regexp, string sText)
{
if ( sText == null)
{
Log.WriteWarning("ValidateTextExp got null text to validate against regExp {0} . returning false",regexp);
return false;
}
return (!Regex.IsMatch(sText, regexp));
}
It seems I'm doing something wrong..
Thanks.
Your regex matches a value that contains any single character which is not in that list.
Your test value matches because it has spaces in it, which do not appear to be in your expression.
The reason it's not is because your character class starts with ^. The reason it matches any value that contains any single character that is not that is because you did not specify the beginning or end of the string, or any quantifiers.
The above assumes I'm not missing the importance of any of the characters in the middle of the character soup :)
This answer is also dependent on how you actually use the Regex in code.
If your intention was for that Regex string to represent the only characters that are actually allowed in a password, you would change the regex like so:
string pattern = "^[A-Z0-9...etc...]+$";
The important parts there are:
The ^ has been removed from inside the bracket, to outside; where it signifies the start of the whole string.
The $ has been added to the end, where it signifies the end of the whole string.
Those are needed because otherwise, your pattern will match anything that contains the valid values anywhere inside - even if invalid values are also present.
finally, I've added the + quantifier, which means you want to find any one of those valid characters, one or more times. (this regex would not permit a 0-length password)
If you wanted to permit the ^ character also as part of the password, you would add it back in between the brackets, but just *not as the first thing right after the opening bracket [. So for example:
string pattern = "^[A-Z0-9^...etc...]+$";
The ^ has special meaning in different places at different times in Regexes.
[^A-Za-z0-9_.\-!##$%^&*()=+;:'\"|~`?\/{}]
----------------------^
Looks fine to me, at least in regards to your question title. I'm not clear yet on why the spaces in your sample don't trip it up.
Note that I'm assuming the purpose of this expression is to find invalid characters. Thus, if the expression is a positive match, you have a bad password that you must reject. Since there appears to be some confusion about this, perhaps I can clear it up with a little psuedo-code:
bool isGoodPassword = !Regex.IsMatch(#"[^A-Za-z0-9_.\-!...]", requestedPassword);
You could re-write this for a positive match (without the negation) like so:
bool isGoodPassword = Regex.IsMatch(#"^[A-Za-z0-9_.\-!...]+$", requestedPassword);
The new expression matches a string that from the beginning of the string is filled with one or more of any of the characters in the list all the way the way to end. Any character not in the list would cause the match to fail.
You regular expression is just an inverted character class and describes just one single character (but that can’t be *). So it depends on how you use that character class.
Depends on how you apply it. It describes exactly one character, however, the ^ in the beginning buggs me a little, as it prohibits every other character, so there is probably something terribly fishy there.
Edit: as pointed out in other answers, the reason for your string to match is the space, not the explanation that was replaced by this line.

Categories