Nested quantifier workaround?

Nested quantifier workaround? - c#

I have an issue that I can't seem to find a solution for it. I'm using Regex.IsMatch to check if an input matches what's expected. If an input contains **, ++, it complains. For example, I would like to save "Message **" as an accepted value, but I keep getting an ArgumentException saying: "Nested quantifier *" whenever I try to call Regex.IsMatch on it. Is there any way to workaround this?
public bool ResponseMatch(string responseText)
{
return Regex.IsMatch(responseText, regexPatternString);
}

It sounds like you're trying to match Message ** as a literal value? In this case, call Regex.Escape:
Regex.Escape("Message **") == "Message\ \*\*"
Then you can use it like this:
var valueToMatch = "Message **";
var matches = Regex.IsMatch(input, Regex.Escape(valueToMatch));
However, if you're just using literal values and not any regex features, you might be better off using string.Contains.

"Message *" as a regex means "match any string that has the characters m e s s a g e in that order, then match zero or more spaces".
Regex.IsMatch takes two inputs - the regex to see if it is matched, and the string to run the regex on. Seems you've got the two confused.
If you're trying to legitmately use "Message **" as a regex, you probably mean to escape the "**". If you only need to escape it in handpicked strings, then fix the string to be, say, #"Message \*\*". If you need to fix any number of regex inputs, then run the Regex.Escape over the string first. #"Message \*\*" == Regex.Escape("Message **")

Related

C# Regex - perform wildcard search and need to escape certain characters with linq

i'm using EF and i've implemented a search function which support wildcard search.
For example if given search input is %CU%P% i will format it to become ^.*?CU.*P.*?$ and select it with regex.isMatch. Something like SQL LIKE '%CU%P%'. Example :
string regexSearch = GeneralHelper.RegexWildCardSearchPattern(requestDto.Name); ( it will format the % become ^.*?CU.*P.*?$)
Regex regex = new Regex(regexSearch, RegexOptions.IgnoreCase);
allCompetitionList = allCompetitionList.Where(x => regex.IsMatch(x.CompetitionName)).ToList();
But when user give me something like %????%, i will hit Nested Quantifier issue.
If user give me something like %???%, i will return nothing but in my DB i do have a record like ????.
I've try if i manually put a input to become %\\\?\\\?\\\?% it will return me ???? record.
I guess i need to escape the characters by adding \\ to any characters i wish to escape except %. How can i achieve that ?

Characters with special meanings in regular expressions can be escaped by preceding them with a backslash. The .Net regular expression classes includes the Regex.Escape(...) method to escape characters in a string.
Possibly Regex.Escape(...) should be used on the input string before the GeneralHelper... method is called. If the GeneralHelper... method only does what is shown in the question (ie add ^ and $ plus replace % with . or .*) then the code below may be sufficient.
string regexSearch = GeneralHelper.RegexWildCardSearchPattern(
Regex.Escape(requestDto.Name));

Regex in a string

I need some help on a problem.
In fact I search to check for an image type by the hexadecimal code.
string JpgHex = "FF-D8-FF-E0-xx-xx-4A-46-49-46-00";
Then I have a condition on
string.StartsWith(pngHex).
The problem is that the "x" characters presents in my "JpgHex" string can be whatever I want.
I think I need a regex to check that but I don't know how!!
Thanks a lot!

I'm not quite clear what exactly you want to do, but the dot '.' character represents any character in Regex.
So the regex "^FF-D8-FF-E0-..-..-4A-46-49-46-00" will probably do the trick. '^' = Start of input.
If you want to allow only hex chars you can use "^FF-D8-FF-E0-[0-9A-F]{2}-[0-9A-F]{2}-4A-46-49-46-00".

Like I said, I'd need a better idea of what pattern you need to match.
Here are some examples:
Regex rgx =
new Regex(#"^FF-D8-FF-E0-[a-zA-Z0-9]{2}-[a-zA-Z0-9]{2}-4A-46-49-46-00$");
rgx.IsMatch(pngHex); // is match will return a bool.
I use [a-zA-Z0-9]{2} to denote two instances of a character, caps or small or a number. So the above regex would match :
FF-D8-FF-E0-aa-zZ-4A-46-49-46-00
FF-D8-FF-E0-11-22-4A-46-49-46-00
.. etc
Based on your need change the regex accordingly so for capitals and numbers only you change to [A-Z0-9]. The {2} denotes two occurrences.
The ^ denotes the string should start with FF and $ means the string should end with 00.
Lets say you wanted to only match two numbers, so you would use \d{2}, the whole thing would look like this:
Regex rgx = new Regex(#"^FF-D8-FF-E0-\d{2}-\d{2}-4A-46-49-46-00$");
rgx.IsMatch(pngHex);
How do I know of these magical characters? Simple, there are docs everywhere. See this MSDN page for some basic regex patterns. This page shows some quantifiers, those are things like match one or more or match only one.
Cheat-sheets also come in handy.

A regex would help you; you can use the following tool to help you test and learn: -
http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx
I recommend you have a play because then you'll learn!
To simply match any character in place of the x, the following should work: -
"^FF-D8-FF-E0-..-..-4A-46-49-46-00$"
In C#, it would be something like this: -
var test = "FF-D8-FF-E0-AB-CD-4A-46-49-46-00";
var foo = new Regex("^FF-D8-FF-E0-..-..-4A-46-49-46-00$");
if (foo.IsMatch(test))
{
// Do magic
}
You will need to read up on regular expressions to understand some of the characters that may not look familiar, i.e. ^ and $. See http://www.regular-expressions.info/

C# regex match behaviour

I've got this line in my code:
Match match = Regex.Match(actualValue, regexValue, RegexOptions.None);
I've got a simple question. why when checking for success meaning with the line:
if(match.Success)
then the match does succeed with the following values:
actualValue = "G:1"
regexValue = "A*"
the actual does not seem to fit at least for me so i probably miss something...
what i do want to achieve is just receiving an actual value and a regular expression and check if the actual value fits the regular expression.. i thought that's what i did there but apparently i didn't.
EDIT: another question. is there a way to treat the * as the "any char" wildcard? meaning is it possible that A* will be considered as A and after it any char is possible?

Your code itself is correct; your regular expression isn't.
Based on your comments on other answers, you're after a regular expression which matches any string which starts with A, and you're assuming that '*' means "any characters". '*' in fact means "match the preceding character zero or more times", so the regular expression you've given means "match the start of the string followed by zero or more 'A' characters", which will match absolutely anything.
If you're looking for a regular expression that matches the whole string but only if it starts with 'A', the regular expression you're after is ^A.*. The '.' character in a regular expression means "match any character". This regular expression thus means "match the start of the string, followed by an 'A', followed by zero or more other characters" and will thus match the entire string provided it starts with 'A'.
However, you already have the whole string, so this is a little unnecessary - all you really want to do is get an answer to the question "does the string start with an 'A'?". A regular expression that will achieve this is simply '^A'. If it matches, the string started with an 'A'.
Of course, it should be pointed out that you don't need a regular expression to confirm this anyway. If this is genuinely all you want to do (and it's possible you've just put together a simple example, and your real scenario is more complicated), why not just use the StartsWith method?:
bool match = actualValue.StartsWith("A");

The regex matches because A* means "look for 0 or more occurrences of 'A'". It will match any string.
If you meant to look for an arbitrary number of 'A', but at least one, try A+ instead.

Looking at the comments it looks like you're trying to match a lot of strings starting with A.
If they're separated by white space you could find all of them using the following:
bool matched = Regex.IsMatch(actualValue, #"\bA\w+");
This matches : "Atest flkjs Apple Ascii cAse".
If there is only one string you're matching and it starts with A and has no spaces:
bool matched = Regex.IsMatch(actualValue, #"^A\w+$");
This matches "Apple", but not "Apple and orange" as the second string has spaces.
As Chris noted * is not a wildcard in the way you meant with regex searches. You can find some information to get you started with regexes at regex-info.

Regex take the regular expression in the constructor.
Exampel in your case could be :
if(new Regex("A*").IsMatch(actualValue)
//Do something
If you are unsecure of the regexpattern, try it out here

Regular Expression in .NET

I should write a regex pattern in c# that checks for input string whether it conains certain characters and does not conain another characters, for example:
I want that the string contain only a-z, not contain (d,b) and the length of all the string longer than 5, I write "[a-z]{5,}", how can I avoid that the input contain d and b?
Additional question: Can I have option to condition in the regex, in other words if whichever boolian var equals true check somthing and if it equals false not check it?
Thanks

simple regex:
/[ace-z]{5}/
matches five occurrences of: characters 'a', 'c', or any character between 'e' and 'z'.

A useful regex resource I always use is:
http://regexlib.com/
Helped me out many times.

For the first question, why not simply try this: [ace-z]{5,} ?
For the second option, can't you format the regex string in some way based on the boolean variable before executing it ?
Or, if you want programmatically exclude some chars, you can create programmatically the regex by expliciting all the chars [abcdefgh....] without the exclusion.

if you want to skip d and b
[ace-z]{5,}
And yes you can have a boolean check using isMatch method of Regex class
Regex regex = new Regex("^[ace-z]{5,}$");
if (regex.IsMatch(textBox1.Text))
{
errorProvider1.SetError(textBox1, String.Empty);
}
else
{
errorProvider1.SetError(textBox1,
"Invalid entry");
}
Source

Does this regex expression allow "*"?

I really know very little about regex's.
I'm trying to test a password validation.
Here's the regex that describes it (I didn't write it, and don't know what it means):
private static string passwordField = "[^A-Za-z0-9_.\\-!##$%^&*()=+;:'\"|~`<>?\\/{}]";
I've tried a password like "dfgbrk*", and my code, using the above regex, allowed it.
Is this consistent with what the regex defines as acceptable, or is it a problem with my code?
Can you give me an example of a string that validation using the above regex isn't suppose to allow?
Added: Here's how the original code uses this regex (and it works there):
public static bool ValidateTextExp(string regexp, string sText)
{
if ( sText == null)
{
Log.WriteWarning("ValidateTextExp got null text to validate against regExp {0} . returning false",regexp);
return false;
}
return (!Regex.IsMatch(sText, regexp));
}
It seems I'm doing something wrong..
Thanks.

Your regex matches a value that contains any single character which is not in that list.
Your test value matches because it has spaces in it, which do not appear to be in your expression.
The reason it's not is because your character class starts with ^. The reason it matches any value that contains any single character that is not that is because you did not specify the beginning or end of the string, or any quantifiers.
The above assumes I'm not missing the importance of any of the characters in the middle of the character soup :)
This answer is also dependent on how you actually use the Regex in code.
If your intention was for that Regex string to represent the only characters that are actually allowed in a password, you would change the regex like so:
string pattern = "^[A-Z0-9...etc...]+$";
The important parts there are:
The ^ has been removed from inside the bracket, to outside; where it signifies the start of the whole string.
The $ has been added to the end, where it signifies the end of the whole string.
Those are needed because otherwise, your pattern will match anything that contains the valid values anywhere inside - even if invalid values are also present.
finally, I've added the + quantifier, which means you want to find any one of those valid characters, one or more times. (this regex would not permit a 0-length password)
If you wanted to permit the ^ character also as part of the password, you would add it back in between the brackets, but just *not as the first thing right after the opening bracket [. So for example:
string pattern = "^[A-Z0-9^...etc...]+$";
The ^ has special meaning in different places at different times in Regexes.

[^A-Za-z0-9_.\-!##$%^&*()=+;:'\"|~`?\/{}]
----------------------^
Looks fine to me, at least in regards to your question title. I'm not clear yet on why the spaces in your sample don't trip it up.
Note that I'm assuming the purpose of this expression is to find invalid characters. Thus, if the expression is a positive match, you have a bad password that you must reject. Since there appears to be some confusion about this, perhaps I can clear it up with a little psuedo-code:
bool isGoodPassword = !Regex.IsMatch(#"[^A-Za-z0-9_.\-!...]", requestedPassword);
You could re-write this for a positive match (without the negation) like so:
bool isGoodPassword = Regex.IsMatch(#"^[A-Za-z0-9_.\-!...]+$", requestedPassword);
The new expression matches a string that from the beginning of the string is filled with one or more of any of the characters in the list all the way the way to end. Any character not in the list would cause the match to fail.

You regular expression is just an inverted character class and describes just one single character (but that can’t be *). So it depends on how you use that character class.

Depends on how you apply it. It describes exactly one character, however, the ^ in the beginning buggs me a little, as it prohibits every other character, so there is probably something terribly fishy there.
Edit: as pointed out in other answers, the reason for your string to match is the space, not the explanation that was replaced by this line.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Nested quantifier workaround? - c#

Related

C# Regex - perform wildcard search and need to escape certain characters with linq

Regex in a string

C# regex match behaviour

Regular Expression in .NET

Does this regex expression allow "*"?

Categories

Resources