Does this regex expression allow "*"? - c#

I really know very little about regex's.
I'm trying to test a password validation.
Here's the regex that describes it (I didn't write it, and don't know what it means):
private static string passwordField = "[^A-Za-z0-9_.\\-!##$%^&*()=+;:'\"|~`<>?\\/{}]";
I've tried a password like "dfgbrk*", and my code, using the above regex, allowed it.
Is this consistent with what the regex defines as acceptable, or is it a problem with my code?
Can you give me an example of a string that validation using the above regex isn't suppose to allow?
Added: Here's how the original code uses this regex (and it works there):
public static bool ValidateTextExp(string regexp, string sText)
{
if ( sText == null)
{
Log.WriteWarning("ValidateTextExp got null text to validate against regExp {0} . returning false",regexp);
return false;
}
return (!Regex.IsMatch(sText, regexp));
}
It seems I'm doing something wrong..
Thanks.

Your regex matches a value that contains any single character which is not in that list.
Your test value matches because it has spaces in it, which do not appear to be in your expression.
The reason it's not is because your character class starts with ^. The reason it matches any value that contains any single character that is not that is because you did not specify the beginning or end of the string, or any quantifiers.
The above assumes I'm not missing the importance of any of the characters in the middle of the character soup :)
This answer is also dependent on how you actually use the Regex in code.
If your intention was for that Regex string to represent the only characters that are actually allowed in a password, you would change the regex like so:
string pattern = "^[A-Z0-9...etc...]+$";
The important parts there are:
The ^ has been removed from inside the bracket, to outside; where it signifies the start of the whole string.
The $ has been added to the end, where it signifies the end of the whole string.
Those are needed because otherwise, your pattern will match anything that contains the valid values anywhere inside - even if invalid values are also present.
finally, I've added the + quantifier, which means you want to find any one of those valid characters, one or more times. (this regex would not permit a 0-length password)
If you wanted to permit the ^ character also as part of the password, you would add it back in between the brackets, but just *not as the first thing right after the opening bracket [. So for example:
string pattern = "^[A-Z0-9^...etc...]+$";
The ^ has special meaning in different places at different times in Regexes.

[^A-Za-z0-9_.\-!##$%^&*()=+;:'\"|~`?\/{}]
----------------------^
Looks fine to me, at least in regards to your question title. I'm not clear yet on why the spaces in your sample don't trip it up.
Note that I'm assuming the purpose of this expression is to find invalid characters. Thus, if the expression is a positive match, you have a bad password that you must reject. Since there appears to be some confusion about this, perhaps I can clear it up with a little psuedo-code:
bool isGoodPassword = !Regex.IsMatch(#"[^A-Za-z0-9_.\-!...]", requestedPassword);
You could re-write this for a positive match (without the negation) like so:
bool isGoodPassword = Regex.IsMatch(#"^[A-Za-z0-9_.\-!...]+$", requestedPassword);
The new expression matches a string that from the beginning of the string is filled with one or more of any of the characters in the list all the way the way to end. Any character not in the list would cause the match to fail.

You regular expression is just an inverted character class and describes just one single character (but that can’t be *). So it depends on how you use that character class.

Depends on how you apply it. It describes exactly one character, however, the ^ in the beginning buggs me a little, as it prohibits every other character, so there is probably something terribly fishy there.
Edit: as pointed out in other answers, the reason for your string to match is the space, not the explanation that was replaced by this line.

Related

ASP.net core RegularException attribute - multiple conditions

I have two regex that should be matched:
"^[a-z0-9\\!#\\$\\^&\\-\\+%\\=_\\(\\)\\{\\}\\<\\>'\";\\:/\\.,~`\\|\\\\]+$"
and
".*(g[o0]+gle).*"
The first one accept any alpha numeric character (with few more extras). Like helloworld123. The second one should reject any string that contain the word "google" (in diffrent forms - like: gooo0gle).
Allowed:
hello
helloworld
helloworld123
Disallowed:
hellogoogle
google
...
I want to use the RegularExpression to match this string. Thought about something like:
[RegularExpression("^[a-z0-9\\!#\\$\\^&\\-\\+%\\=_\\(\\)\\{\\}\\<\\>'\";\\:/\\.,~`\\|\\\\]+$|.*(g[o0]+gle).*"]
But it's not working since the second part (.*(g[o0]+gle).*) should be NOT.
How to do it right?
Thanks.
You can use your second regex by placing it in a negative look ahead and use the first regex as character set and combine both to get following regex that you can use,
^(?!.*g[o0]+gle)[-a-z0-9!#$^&+%=_(){}<>'";:\/.,~`|]+$
Here, this (?!.*g[o0]+gle) negative look ahead will reject any strings that contains google or any variation as supported by your regex, and this character set [-a-z0-9!#$^&+%=_(){}<>'";:\/.,~|]+` will match one or more characters allowed by it.
Also, you don't need to escape most special characters while they are in character set, hence I have unescaped most of them except / and also always place the hyphen - either as the very first character or very last character in the character set, else depending upon the regex dialects, you may see weird behavior.
Regex Demo

Finding the beginning and end of a substring using regex

I have an awful time with regular expressions, so I usually resort to lousy kludges and workarounds when parsing strings. I need to get better at using regex. This one seems simple to me, but I don't even know where to start.
Here's the string output from my device:
testString = IP:192.168.5.210\rPlaylist:1\rEnable:On\rMode:HDMI\rLineIn:unbal\r
Example:
I want to find if the device is off or on. I need to search for the string "Enable:" then locate the carriage return and determine if the word between Enable: and \r is off or on. It seems like that's what regex is for or do I totally misunderstand it.
Can someone point me in the right direction?
Additional information - Maybe I need to expand on the question.
Based on the answers, finding whether or not the device is Enabled appears to be fairly simple. Since I get a return string is similar to a key/value pair what's more vexing determining the substring between the : and the carriage return. A number of these pairs have a response with lengths that vary significantly, such as DeviceLocation, DeviceName, IPAddress. In fact, the device responds to every command sent to it by returning the entire status list, 48 key/value pairs, which I then must parse even if I only need to know one property.
Also based on your answers .... regular expressions is not the way to go.
Thanks for any help.
Norm
I would suggest for a simple line as shown, ask for one or the other, but verify as well. Based partially off Ken White's suggestions.
if(input.Contains(":On")){
//DoWork()
}else{
if(input.Contains(":Off"))
//DoOtherWork
}
This presumes that ":On" and ":Off" will not appear anywhere else in the string, even with a different string.
Consider the following code:
// This regular expression matches text 'Enabled: ' followed by one or more non '\r' followed by '\r'
// RegexOptions.Multiline is optional but MAY be necessary on other platforms.
// Also, '\r' is not a line break. '\n' is.
Regex regex = new Regex("Enable: ([^\r]+)\r", RegexOptions.Multiline);
string input = "IP:192.168.5.210\rPlaylist: 1\rEnable: On\rMode: HDMI\rLineIn: unbal\r";
var matches = regex.Match(input);
Debug.Assert(matches != Match.Empty);
// The match variable will contain 2 Groups:
// First will be 'Enabled: On\r'
// The other is 'On' since we enclosed ([^\r]+) in ().
Console.WriteLine(matches.Groups[1]);

Extracting a number from between parenthesis

I have a regular expression designed to extract a number from between two parenthesis. It had been working fine until we made the input string customizable. Now, if a number is found somewhere else in the string, the last number is taken. My expression is below:
int icorrespid = Convert.ToInt32(Regex.Match(subject, #"(\d+)(?!.#\d)", RegexOptions.RightToLeft).Value);
If I send the string This (12) is a test, it works fine, extracting the 12. However, if I send This (12) is a test2, the result is 2. I realize I can change the RightToLeft to LeftToRight, which will fix this instance, but I only want to get the number between the parenthesis.
I am sure this will be easy for anyone with any regular expression experience (which is obviously not me). I am hoping you could show me how to correct this to get what I want, but also give a brief explanation of what I am doing wrong so I can hopefully improve.
Thank you.
Additional Information
I appreciate all of the responses. I have taken the agreed upon advice, and tried each of these formats:
int icorrespid = Convert.ToInt32(Regex.Match(subject, #"(\(\d+\))(?!.#\d)", RegexOptions.RightToLeft).Value);
int icorrespid = Convert.ToInt32(Regex.Match(subject, #"(\(\d+\))", RegexOptions.RightToLeft).Value);
int icorrespid = Convert.ToInt32(Regex.Match(subject, #"\(\d+\)", RegexOptions.RightToLeft).Value);
Unfortunately, with each, I get an exception stating that the input string was not in a correct format. I did not get that before. I'm sure that I could resolve this without using a regular expression in a minute or two, but my stubbornness has kicked in.
Thank you everyone for your comments.
you need to escape parenthesis in regex, because they mean something
#"(\(\d+\))(?!.#\d)
or, if you didn't actually intend your number to be caught in a group
#"\(\d+\)(?!.#\d)
Try this regular expression:
\(#(\d+)\)
The brackets are escaped \( and \) and inside them is the normal search for numbers.
If you use the .Value property, it will give you the number surrounded by brackets. Instead you need to use the Groups collection. So to use in your code, you do this: (now with added error checking!)
var match = Regex.Match("hgf", #"\(#(\d+)\)", RegexOptions.RightToLeft).Groups[1].Value;
if(!string.IsNullOrEmpty(match))
{
var icorrespid = Convert.ToInt32(match);
}
else
{
//No match found
}
Use:
\(\d+\)(?!.#\d)
( and ) are reserved characters known as a capture group.
Parentheses have a meaning in regex, so you need to escape them:
\(\d+\)
The actual meaning is to create a capture group, so if you're relying on a capture group in your code, you need another pair of parentheses like this:
\((\d+)\)
I'm not quite sure what the purpose of the (?!.#\d) part is from your question, but if you do need it, you can leave it where it is (just append it to the end of either of the versions above)

Regex matching only when condition has not been met

I have kind of a weird problem that I am attempting to resolve with some elegant regular expressions.
The system I am working on was initially designed to accept an incoming string and through a pattern matching method, alter the string which it then returns. A very simplistic example is:
Incoming string:
The dog & I went to the park and had a great time...
Outgoing string:
The dog {&} I went to the park and had a great time {...}
The punctuation mapper wraps key characters or phrases and wraps them in curly braces. The original implementation was a one way street and was never meant for how it is currently being applied and as a result, if it is called incorrectly, it is very easy for the system to "double" wrap a string as it is just doing a simple string replace.
I spun up Regex Hero this morning and started working on some pattern matches and having not written a regular expression in nearly a year, quickly hit a wall.
My first idea was to match a character (i.e. &) but only if it wasn't wrapped in braces and came up with [^\{]&[^\}], which is great but of course catches any instance of the ampersand so long as it is not preceded by a curly brace, including white spaces and would not work in a situation where there were two ampersands back to back (i.e. && would need to be {&}{&} in the outgoing string. To make matters more complicated, it is not always a single character as ellipsis (...) is also one of the mapped values.
Every solution I noodle over either hits a barrier because there is an unknown number of occurrences of a particular value in the string or that the capture groups will either be too greedy or finally, cannot compensate for multiple values back to back (i.e. a single period . vs ellipsis ...) which the original dev handled by processing ellipsis first which covered the period in the string replace implementation.
Are there any regex gurus out there that have any ideas on how I can detect the undecorated (unwrapped) values in a string and then perform their replacements in an ungreedy fashion that can also handle multiple repeated characters?
My datasource that I am working against is a simple key value pair that contains the value to be searched for and the value to replace it with.
Updated with example strings:
Undecorated:
Show Details...
Default Server:
"Smart" 2-Way
Show Lender's Information
Black & White
Decorated:
Show Details{...}
Default Server{:}
{"}Smart{"} 2-Way
Show Lender{'}s Information
Black {&} White
Updated With More Concrete Examples and Datasource
Datasource (SQL table, can grow at any time):
TaggedValue UntaggedValue
{:} :
{&} &
{<} <
{$} $
{'} '
{} \
{>} >
{"} "
{%} %
{...} ...
{...} …
{:} :
{"} “
{"} ”
{'} `
{'} ’
Broken String: This is a string that already has stuff {&} other stuff{!} and {...} with {_} and {#} as well{.} and here are the same characters without it & follow by ! and ... _ & . &&&
String that needs decoration: Show Details... Default Server: "Smart" 2-Way Show Lender's Information Black & White
String that would pass through the method untouched (because it was already decorated): The dog {&} I went to the park and had a great time {...}
The other "gotcha" in moving to regex is the need to handle escaping, especially of backslashes elegantly due to their function in regular expressions.
Updated with output from #Ethan Brown
#Ethan Brown,
I am starting think that regex, while elegant might not be the way to go here. The updated code you provided, while closer still does not yield correct results and the number of variables involved may exceed the regex logics capability.
Using my example above:
'This is a string that already has stuff {&} other stuff{!} and {...} with {_} and {#} as well{.} and here are the same characters without it & follow by ! and ... _ & . &&&'
yields
This is a string that already has stuff {&} other stuff{!} and {...} with {_} and {#} as well{.} and here are the same characters without it {&} follow by {!} and {...} {_} {&} . {&&}&
Where the last group of ampersands which should come out as {&}{&}{&} actually comes out as {&&}&.
There is so much variability here (i.e. need to handle ellipsis and wide ellipsis from far east languages) and the need to utilize a database as the datasource is paramount.
I think I am just going to write a custom evaluator which I can easily enough write to perform this type of validation and shelve the regex route for now. I will grant you credit for your answer and work as soon as I get in front of a desktop browser.
This kind of problem can be really tough, but let me give you some ideas that might help out. One thing that's really going to give you headaches is handling the case where the punctuation appears at the beginning or end of the string. Certainly that's possible to handle in a regex with a construct like (^|[^{])&($|[^}]), but in addition to that being painfully hard to read, it also has efficiency issues. However, there's a simple way to "cheat" and get around this problem: just pad your input string with a space on either end:
var input = " " + originalInput + " ";
When you're done you can just trim. Of course if you care about preserving input at the beginning or end, you'll have to be more clever, but I'm going to assume for argument's sake that you don't.
So now on to the meat of the problem. Certainly, we can come up with some elaborate regular expressions to do what we're looking for, but often the answer is much much simpler if you use more than one regular expression.
Since you've updated your answer with more characters, and more problem inputs, I've updated this answer to be more flexible: hopefully it will meet your needs better as more characters get added.
Looking over your input space, and the expressions you need quoted, there are really three cases:
Single-character replacements (! becomes {!}, for example).
Multi-character replacements (... becomes {...}).
Slash replacement (\ becomes {})
Since the period is included in the single-character replacements, order matters: if you replace all the periods first, then you will miss ellipses.
Because I find the C# regex library a little clunky, I use the following extension method to make this more "fluent":
public static class StringExtensions {
public static string RegexReplace( this string s, string regex, string replacement ) {
return Regex.Replace( s, regex, replacement );
}
}
Now I can cover all of the cases:
// putting this into a const will make it easier to add new
// characters in the future
const string normalQuotedChars = #"\!_\\:&<\$'>""%:`";
var output = s
.RegexReplace( "(?<=[^{])\\.\\.\\.(?=[^}])", "{$&}" )
.RegexReplace( "(?<=[^{])[" + normalQuotedChars + "](?=[^}])", "{$&}" )
.RegexReplace( "\\\\", "{}" );
So let's break this solution down:
First we handle the ellipses (which will keep us from getting in trouble with periods later). Note that we use a zero-width assertions at the beginning and end of the expression to exclude expressions that are already quoted. The zero-width assertions are necessary, because without them, we'd get into trouble with quoted characters right next to each other. For example, if you have the regex ([^{])!([^}]), and your input string is foo !! bar, the match would include the space before the first exclamation point and the second exclamation point. A naive replacement of $1!$2 would therefore yield foo {!}! bar because the second exclamation point would have been consumed as part of the match. You'd have to end up doing an exhaustive match, and it's much easier to just use zero-width assertions, which are not consumed.
Then we handle all of the normal quoted characters. Note that we use zero-width assertions here for the same reasons as above.
Finally, we can find lone slashes (note we have to escape it twice: once for C# strings and again for regex metacharacters) and replace that with empty curly brackets.
I ran all of your test cases (and a few of my own invention) through this series of matches, and it all worked as expected.
I'm no regex god, so one simple way:
Get / construct the final replacement string(s) - ex. "{...}", "{&}"
Replace all occurrences of these in the input with a reserved char (unicode to the rescue)
Run your matching regex(es) and put "{" or whatever desired marker(s).
Replace reserved char(s) with the original string.
Ignoring the case where your original input string has a { or } character, a common way to avoid re-applying a regex to an already-escaped string is to look for the escape sequence and remove it from the string before applying your regex to the remainders. Here's an example regex to find things that are already escaped:
Regex escapedPattern = new Regex(#"\{[^{}]*\}"); // consider adding RegexOptions.Compiled
The basic idea of this negative-character class pattern comes from regular-expressions.info, a very helpful site for all thing regex. The pattern works because for any inner-most pair of braces, there must be a { followed by non {}'s followed by a }
Run the escapedPattern on the input string, find for each Match get the start and end indices in the original string and substring them out, then with the final cleaned string run your original pattern match again or use something like the following:
Regex punctPattern = new Regex(#"[^\w\d\s]+"); // this assumes all non-word,
// digit or space chars are punctuation, which may not be a correct
//assumption
And replace Match.Groups[1].Value for each match (groups are a 0 based array where 0 is the whole match, 1 is the first set of parentheses, 2 is the next etc.) with "{" + Match.Groups[1].Value + "}"

Excluding certain patterns in a regex

I'm working on a Regex in C# to exclude certain patterns within a string.
These are the types patterns I want to accept are: "%00" (Hex 00-FF) and any other character without a starting '%'. The patterns I would like to exclude are: "%0" (Values with a starting % and one character after) and/or characters "&<>'/".
So far I have this
Regex correctStringRegex = new Regex(#"(%[0-9a-fA-F]{2})|[^%&<>'/]|(^(%.))",
RegexOptions.IgnoreCase);
Below are examples of what I'm trying to pass and reject.
Passing String %02This is%0A%0Da string%03
Reject String %0%0Z%A&<%0a%
If a string doesn't pass all the requirements I would like to reject the whole string completely.
Any Help would be greatly appreciated!
I suggest this:
^(?:%[0-9a-f]{2}|[^%&<>'/])*$
Explanation:
^ # Start of string
(?: # Match either
%[0-9a-f]{2} # %xx
| # or
[^%&<>'/] # any character except the forbidden ones
)* # any number of times
$ # until end of string.
This ensures that % is only matched when followed by two hexadecimals. Since you're already compiling the regex with the IgnoreCase flag set, you don't need a-fA-F, either.
Hmm, given the comments so far, I think you need a different problem definition. You want to pass or fail a string, using regex, based on whether or not the string contains any invalid patterns. Im assuming a string will fail if there is ANY invalid pattern, rather than the reverse of a string passing if there is any valid pattern.
As such, I would use this regex: %(?![0-9a-f]{2})|[&<>'/]
You would then run this in such a way that a string is invalid if you GET a match, a valid string will not have any matches in this set.
A quick explanation of a rather odd regex. The format (?!) tells the regex "Match the previous symbol if the symbols in this set DONT follow it" ie: Match if suffix not present. So, what im telling it to look for is any instance of % that is not followed by 2 hex characters, or any other invalid character. The assumption is that anything that DOESN'T match this regex is a valid character entry.

Categories