Star - look for the character * in a string using regex - c#

I am trying to find the following text in my string : '***'
the thing is that the C# Regex mechanism doesnt allow me to do the following:
new Regex("***", RegexOptions.CultureInvariant | RegexOptions.Compiled);
due to
ArgumentException: "parsing "*" - Quantifier {x,y} following nothing."
obviously it thinks that my stars represents regular expressions,
is there a way to tell the Regex mechanism to treat stars as just stars and nothing else?

* in Regex means:
Matches the previous element zero or more times.
so that, you need to use \* or [*] instead.
explain:
\
When followed by a character that is not recognized as an escaped character in this and other tables in this topic, matches that character. For example, \* is the same as \x2A.
[ character_group ]
Matches any single character in character_group.

You need to escape the star with a backslash: #"\*"

Related

Regular expression in RegularExpressionAttribute behavior

I am using this regular expression: #"[ \]\[;\/\\\?:*""<>|+=]|^[.]|[.]$"
First part [ \]\[;\/\\\?:*""<>|+=] should match any of the characters inside the brackets.
Next part ^[.] should match if the string starts with a 'dot'
Last part [.]$ should match if the string ends with a 'dot'
This works perfectly fine if I use Regex.IsMatch() function. However if I use RegularExpressionAttribute in ASP.NET MVC, I always get invalid model. Does anyone have any clue why this behavior occurs?
Examples:
"abcdefg" should not match
".abcdefg" should match
"abc.defg" should not match
"abcdefg." should match
"abc[defg" should match
Thanks in advance!
EDIT:
The RegularExpressionAttribute Specifies that a data field value in ASP.NET Dynamic Data must match the specified regular expression..
Which means. I need the "abcdef" to match, and ".abcdefg" to not match. Basically negate the whole expression I have above.
You need to make sure the pattern matches the entire string.
In a general case, you may append/prepend the pattern with .*.
Here, you may use
.*[ \][;/\\?:*"<>|+=].*|^[.].*|.*[.]$
Or, to make it a bit more efficient (that is, to reduce backtracking in the first branch) a negated character class will perform better:
[^ \][;/\\?:*"<>|+=]*[ \][;\/\\?:*"<>|+=].*|^[.].*|.*[.]$
But it is best to put the branches matching text at the start/end of the string as first branches:
^[.].*|.*[.]$|[^ \][;/\\?:*"<>|+=]*[ \][;/\\?:*"<>|+=].*
NOTE: You do not have to escape / and ? chars inside the .NET regex since you can't use regex delimiters there.
C# declaration of the last pattern will look like
#"^[.].*|.*[.]$|[^ \][;/\\?:*""<>|+=]*[ \][;/\\?:*""<>|+=].*"
See this .NET regex demo.
RegularExpressionAttrubute:
[RegularExpression(
#"^[.].*|.*[.]$|[^ \][;/\\?:*""<>|+=]*[ \][;/\\?:*""<>|+=].*",
ErrorMessage = "Username cannot contain following characters: ] [ ; / \\ ? : * \" < > | + =")
]
Your regex is an alternation which matches 1 character out of 3 character classes, the first consisting of more than 1 characters, the second a dot at the start of the string and the third a dot at the end of the string.
It works fine because it does match one of the alternations, only not the whole string you want to match.
You could use 3 alternations where the first matches a dot followed by repeating the character class until the end of the string, the second the other way around but this time the dot is at the end of the string.
Or the third using a positive lookahead asserting that the string contains at least one of the characters [\][;\/\\?:*"<>|+=]
^\.[a-z \][;\/\\?:*"<>|+=]+$|^[a-z \][;\/\\?:*"<>|+=]+\.$|^(?=.*[\][;\/\\?:*"<>|+=])[a-z \][;\/\\?:*"<>|+=]+$
Regex demo

Cannot match parentheses in regex group

This is a regular expression, evaluated in .NET
I have the following input:
${guid->newguid()}
And I want to produce two matching groups, a character sequence after the ${ and before }, which are split by -> :
guid
newguid()
The pattern I am using is the following:
([^(?<=\${)(.*?)(?=})->]+)
But this doesn't match the parentheses, I am getting only the following matches:
guid
newguid
How can I modify the regex so I get the desired groups?
Your regex - ([^(?<=\${)(.*?)(?=})->]+) - match 1+ characters other than those defined in the negated character class (that is, 1 or more chars other than (, ?, <, etc).
I suggest using a matching regex like this:
\${([^}]*?)->([^}]*)}
See the regex demo
The results you need are in match.Groups[1] and match.Groups[2].
Pattern details:
\${ - match ${ literal character sequence
([^}]*?) - Group 1 capturing 0+ chars other than } as few as possible
-> - a literal char sequence ->
([^}]*) - Group 2 capturing 0+ chars other than } as many as possible
} - a literal }.
If you know that you only have word chars inside, you may simplify the regex to a mere
\${(\w+)->(\w+\(\))}
See the regex demo. However, it is much less generic.
Your input structure is always ${identifier->identifier()}? If this is the case, you can user ^\$\{([^-]+)->([^}]+)\}$.
Otherwise, you can modify your regexpr to ([^?<=\${.*??=}\->]+): using this rexexpr you should match input and get the desired groups: uid and newguid(). The key change is the quoting of - char, which is intendend as range operator without quoting and forces you to insert parenthesis in your pattern - but... [^......(....)....] excludes parenthesis from the match.
I hope than can help!
EDIT: testing with https://regex101.com helped me a lot... showing me that - was intended as range operator.

Regular Expression that matches on values after a pipe in between brackets

I'm still learning a lot about regex, so please forgive any naivety.
I've been using this site to test:
http://www.systemtextregularexpressions.com/regex.match
Basically, I'm having issues writing a regular expression that will match on any value after a pipe in between brackets.
Given an example string of:
"<div> \n [dont1.dont2|match1|match2] |dont3 [dont4] dont5. \n </div>"
Expected output would be a collection:
match1,
match2
The closest I've been able to get so far is:
(?!\[.*(\|)\])(?:\|)([\w-_.,:']*)
Above gives me the values, including the pipes, and dont3.
I've also tried this guy:
\|(.*(?=\]))
but it outputs:
|match1|match2
Here's one way of doing it:
(?<=\[[^\]]*\|)[^\]|]*
Here's the meaning of the pattern:
(?<=\[[^\]]*\|) - Lookbehind expression to ensure that any match must be preceded by an open bracket, followed by any number of non-close-bracket characters, followed by a pipe character
(?<= ... ) - Declares a lookbehind expression. Something matching the lookbehind must immediately precede the text in order for it the match. However, the part matched by the lookbehind is not included in the resulting match.
\[ - Matches an open bracket character
[^\]]* - Matches any number of non-close-bracket characters
\| - Matches a pipe character
[^\]|]* - Matches any number of characters which are neither close brackets nor pipe characters.
The lookbehind is greedy, so it will allow for any number of pipes between the open bracket and the matching text.
try this:
\[.*?(?:\|(?<mydata>.*?))+\]
note: the online tool will only show you the last capture inside a quantifed () for a given match, but .NET will remember each capture of a group that matches multiple times
Try this:
^<div>\s*[^|]+|([^|]+)|([^|]+)

Using an escape character with a beginning wildcard in regex in c#

Below is a sample of an email I am using from a database:
2.2|[johnnyappleseed#example.com]
Every line is different, and it may or may not be an email, but it will always. I am trying to use regular expressions to get the information inside the brackets. Below is what I have been trying to use:
^\[\]$
Unfortunately, every time I try to use it, the expression isn't matching. I think the problem is using the escape characters, but I am not sure. If this is not how I use the escape characters with this, or if I am wrong completely, please let me know what the actual regex should be.
Close to yours is ^.*\[(.*)\]$:
^ start of the line
.* anything
\[ a bracket, indicating the start of the email
(.*) anything (the email), as a capturing group
\] a square bracked, indicating the end of the email
$ end of the line
Note that your Regex is missing the .* parts to match the things between the key characters [ and ].
Your regex - ^\[\]$ - matches a single string/line that only contains [], and you need to obtain a substring inbetween the square brackets somewhere further inside a larger string.
You can use
var rx = new Regex(#"(?<=\[)[^]]+");
Console.WriteLine(rx.Match(s).Value);
See regex demo
With (?<=\[) we find the position after [ and then we match every character that is not ] with [^]]+.
Another, non-regex way:
var s = "2.2|[johnnyappleseed#example.com]";
var ss = s.Split('|');
if (ss.GetLength(0) > 1)
{
var last = ss[ss.GetLength(0)-1];
if (last.Contains("[") && last.Contains("#")) // We assume there is an email
Console.WriteLine(last.Trim(new[] {'[', ']'}));
}
See IDEONE demo of both approaches

Regex - The minus sign is not getting caught when processing float numbers in C#

I'm trying to extract 2 numbers from the following string (named interim):
"location" : { "lat" : 42.3875968, "lng" : -71.0994968 },
Here's the code I use in C#:
// define a regex for float numbers
Regex rx = new Regex(#"\b-?[0-9]*\.?[0-9]+\b", RegexOptions.Compiled | RegexOptions.IgnoreCase);
// Find matches.
MatchCollection matches = rx.Matches(interim);
return matches[0].ToString() + ", " + matches[1].ToString();
The return is "42.3875968, 71.0994968", without the minus sign for the second float number.
I debugged into the code that can confirm that the "-" is not in the result of matches var.
I've also tested the following regexes, for the same result:
[-+]?[0-9]*\.?[0-9]+
(-|+)?[0-9]*\.?[0-9]+
(-|\+)?[0-9]*\.?[0-9]+
(-\+)?[0-9]*\.?[0-9]+
Any one have an idea why this doesn't work?
Thanks,
Mylo
I suspect that your problem is that the \b at the beginning and end of your expression are not what you really want. It will work if you remove them and just use #"-?[0-9]*\.?[0-9]+".
What those do is ensure there is a boundary between word and non-word characters. This is almost certainly not what you want.
Your regular expression is wrong in the code above for the string you're reading.
If you use a regex tester for c# you'll see that the above regex "\b-?[0-9]*\.?[0-9]+\b" does not include the minus sign. If you change it slightly to "\b*-?[0-9]*\.?[0-9]+\b" then you will capture the - sign in the match.
Source: http://regexhero.net/tester/
Others say the \b is causing problems.
It could be all you need is something like this:
// ([-+]?[0-9]*\.?[0-9]*(?<=\d\.|\d))
( # (1 start)
[-+]? [0-9]* \.? [0-9]*
(?<= \d \. | \d )
) # (1 end)
Edit:
Comments are broken for my login.
To #JNYRanger
\b*-?[0-9]*\.?[0-9]+\b
\b* means optionally match a word boundry forever.
\b? means optionally match a word boundry once.
\b means match a word boundry once.
\b is a zero-width assertion, if its optional, usually don't include it.

Categories