Regex to match pattern consisting of several groups

Regex to match pattern consisting of several groups - c#

I have the following regular expression, but it only matches the last occurrence of the pattern found. The regular expression is designed to match the following pattern:
A single digit followed by a \
A word followed by \
Another word followed by \
The forth group can either be a single word followed by a \ or 2 or 3 words followed by \
The fifth group must be a floating point number in the format 00.00
The regular expression is:
([0-9])\\(\w+)\\(\w+)\\((\w+\s+\w+\s+\w+)|\w+\s\w+|\w+)\\([+-]?\d*\.\d+)(?![-+0-9\\.])\\
The string being match is:
2\James\Brown\Football Club Mu\15.45\1\Jessie\Ellis\Football Club Performance\15.48\4\Dane\Brown\FC Football \15.52\5\Richardo\Flemmings\Football Club Striders\15.53\7\Lawrence\Brown\Football Club Testing\15.53\8\Jermy\Black\Football Club Ch\15.34\\
The match of the last record is only detected if the regex expression does not terminate with \\ and the string that is to be matched against the regular expression does not terminate with "\\".
Note, the string that is to be compared to the regular expression always terminates with a "\\".

The regex you provided doesn't appear to work at all. I can't make out what you're trying to do with that, especially the '+' and '-' characters. To perfectly match your definition, I've got this:
([0-9])\\\w+\\\w+\\(\w+( \w+)?( \w+)?)\\[0-9][0-9]\.[0-9][0-9]
Although your examples don't quite match your definition, as they have a trailing '\', and the third example has a trailing space in the fourth group. Assuming those examples are valid, I've modified it to this:
([0-9])\\\w+\\\w+\\(\w+( \w+)?( \w+)?) ?\\[0-9][0-9]\.[0-9][0-9]\\

Related

Regular expression to match capture group not preceded by certain characters

I want to write a regex that will match if and only if the pattern is not preceded by the characters "Etc/".
Strings that should match:
GMT+01:00
UTC+01:00
UTC+01
+01:00
...
Strings that should not match:
Etc/GMT+01:00
Etc/UTC+01:00
Etc/UTC+01
...
This is what I have so far:
(?<!Etc\/)((UTC|GMT)?(\+|\-){1}(\d{1,2})(:|\.)?(\d{1,2})?)
The right part of the above regular expression already matches the UTC and GMT offset and covers all the cases I need. But I don't manage to implement the exceptions mentioned above.
I expected the above regex to not match the string Etc/GMT+1:00. But in fact it matches the part +01:00 and only ignores Etc/GMT.
How can I achieve that the the following regular expression does not match if it is preceded with "Etc/"?
(UTC|GMT)?(\+|\-){1}(\d{1,2})(:|\.)?(\d{1,2})?
Here I have an example with most of the use cases I need.

You may add \S* after Etc/ to make sure Etc/ is checked even if there are any zero or more non-whitespace chars between Etc/ and the expected match:
(?<!\bEtc/\S*)((UTC|GMT)?([+-])(\d{1,2})[:.]?(\d{1,2})?)
See the .NET regex demo
Details:
(?<!\bEtc/\S*) - a negative lookbehind that matches a location that is not immediately preceded with a whole word Etc/ and then zero or more non-whitespace chars
(UTC|GMT)? - an optional substring, UTC or GMT
([+-]) - + or -
(\d{1,2}) - one or two digits
[:.]? - an optional : or .
(\d{1,2})? - an optional sequence of one or two digits (equal to (\d{0,2})).

As you are already capturing all data in groups, another way could be getting all the matches of Etc/ out of the way, and use your pattern to capture what you want in the groups.
Note that you can change groupings of single chars like (:|\.) to a character class ([:.])
\bEtc/\S*|(UTC|GMT)?([+-])(\d{1,2})([:.])?(\d{1,2})?
\bEtc/\S* Match Etc/ and optional non whitespace chars
| Or
(UTC|GMT)?([+-])(\d{1,2})([:.])?(\d{1,2})? Your pattern with all the separate groups.
Regex demo
Or with just a single group:
\bEtc/\S*|((?:GMT|UTC)?\+\d{2}(?:[:.]\d{2})?)
Regex demo

Regular expression in RegularExpressionAttribute behavior

I am using this regular expression: #"[ \]\[;\/\\\?:*""<>|+=]|^[.]|[.]$"
First part [ \]\[;\/\\\?:*""<>|+=] should match any of the characters inside the brackets.
Next part ^[.] should match if the string starts with a 'dot'
Last part [.]$ should match if the string ends with a 'dot'
This works perfectly fine if I use Regex.IsMatch() function. However if I use RegularExpressionAttribute in ASP.NET MVC, I always get invalid model. Does anyone have any clue why this behavior occurs?
Examples:
"abcdefg" should not match
".abcdefg" should match
"abc.defg" should not match
"abcdefg." should match
"abc[defg" should match
Thanks in advance!
EDIT:
The RegularExpressionAttribute Specifies that a data field value in ASP.NET Dynamic Data must match the specified regular expression..
Which means. I need the "abcdef" to match, and ".abcdefg" to not match. Basically negate the whole expression I have above.

You need to make sure the pattern matches the entire string.
In a general case, you may append/prepend the pattern with .*.
Here, you may use
.*[ \][;/\\?:*"<>|+=].*|^[.].*|.*[.]$
Or, to make it a bit more efficient (that is, to reduce backtracking in the first branch) a negated character class will perform better:
[^ \][;/\\?:*"<>|+=]*[ \][;\/\\?:*"<>|+=].*|^[.].*|.*[.]$
But it is best to put the branches matching text at the start/end of the string as first branches:
^[.].*|.*[.]$|[^ \][;/\\?:*"<>|+=]*[ \][;/\\?:*"<>|+=].*
NOTE: You do not have to escape / and ? chars inside the .NET regex since you can't use regex delimiters there.
C# declaration of the last pattern will look like
#"^[.].*|.*[.]$|[^ \][;/\\?:*""<>|+=]*[ \][;/\\?:*""<>|+=].*"
See this .NET regex demo.
RegularExpressionAttrubute:
[RegularExpression(
#"^[.].*|.*[.]$|[^ \][;/\\?:*""<>|+=]*[ \][;/\\?:*""<>|+=].*",
ErrorMessage = "Username cannot contain following characters: ] [ ; / \\ ? : * \" < > | + =")
]

Your regex is an alternation which matches 1 character out of 3 character classes, the first consisting of more than 1 characters, the second a dot at the start of the string and the third a dot at the end of the string.
It works fine because it does match one of the alternations, only not the whole string you want to match.
You could use 3 alternations where the first matches a dot followed by repeating the character class until the end of the string, the second the other way around but this time the dot is at the end of the string.
Or the third using a positive lookahead asserting that the string contains at least one of the characters [\][;\/\\?:*"<>|+=]
^\.[a-z \][;\/\\?:*"<>|+=]+$|^[a-z \][;\/\\?:*"<>|+=]+\.$|^(?=.*[\][;\/\\?:*"<>|+=])[a-z \][;\/\\?:*"<>|+=]+$
Regex demo

Cannot match parentheses in regex group

This is a regular expression, evaluated in .NET
I have the following input:
${guid->newguid()}
And I want to produce two matching groups, a character sequence after the ${ and before }, which are split by -> :
guid
newguid()
The pattern I am using is the following:
([^(?<=\${)(.*?)(?=})->]+)
But this doesn't match the parentheses, I am getting only the following matches:
guid
newguid
How can I modify the regex so I get the desired groups?

Your regex - ([^(?<=\${)(.*?)(?=})->]+) - match 1+ characters other than those defined in the negated character class (that is, 1 or more chars other than (, ?, <, etc).
I suggest using a matching regex like this:
\${([^}]*?)->([^}]*)}
See the regex demo
The results you need are in match.Groups[1] and match.Groups[2].
Pattern details:
\${ - match ${ literal character sequence
([^}]*?) - Group 1 capturing 0+ chars other than } as few as possible
-> - a literal char sequence ->
([^}]*) - Group 2 capturing 0+ chars other than } as many as possible
} - a literal }.
If you know that you only have word chars inside, you may simplify the regex to a mere
\${(\w+)->(\w+\(\))}
See the regex demo. However, it is much less generic.

Your input structure is always ${identifier->identifier()}? If this is the case, you can user ^\$\{([^-]+)->([^}]+)\}$.
Otherwise, you can modify your regexpr to ([^?<=\${.*??=}\->]+): using this rexexpr you should match input and get the desired groups: uid and newguid(). The key change is the quoting of - char, which is intendend as range operator without quoting and forces you to insert parenthesis in your pattern - but... [^......(....)....] excludes parenthesis from the match.
I hope than can help!
EDIT: testing with https://regex101.com helped me a lot... showing me that - was intended as range operator.

Regular Expression that matches on values after a pipe in between brackets

I'm still learning a lot about regex, so please forgive any naivety.
I've been using this site to test:
http://www.systemtextregularexpressions.com/regex.match
Basically, I'm having issues writing a regular expression that will match on any value after a pipe in between brackets.
Given an example string of:
"<div> \n [dont1.dont2|match1|match2] |dont3 [dont4] dont5. \n </div>"
Expected output would be a collection:
match1,
match2
The closest I've been able to get so far is:
(?!\[.*(\|)\])(?:\|)([\w-_.,:']*)
Above gives me the values, including the pipes, and dont3.
I've also tried this guy:
\|(.*(?=\]))
but it outputs:
|match1|match2

Here's one way of doing it:
(?<=\[[^\]]*\|)[^\]|]*
Here's the meaning of the pattern:
(?<=\[[^\]]*\|) - Lookbehind expression to ensure that any match must be preceded by an open bracket, followed by any number of non-close-bracket characters, followed by a pipe character
(?<= ... ) - Declares a lookbehind expression. Something matching the lookbehind must immediately precede the text in order for it the match. However, the part matched by the lookbehind is not included in the resulting match.
\[ - Matches an open bracket character
[^\]]* - Matches any number of non-close-bracket characters
\| - Matches a pipe character
[^\]|]* - Matches any number of characters which are neither close brackets nor pipe characters.
The lookbehind is greedy, so it will allow for any number of pipes between the open bracket and the matching text.

try this:
\[.*?(?:\|(?<mydata>.*?))+\]
note: the online tool will only show you the last capture inside a quantifed () for a given match, but .NET will remember each capture of a group that matches multiple times

Try this:
^<div>\s*[^|]+|([^|]+)|([^|]+)

Regex - documentation

Hello all I am going through some old code and ran across a reg-ex, I cant figure out what it does, Can anyone shed some light on it.
<(.|\n)*?>|{(.|\n)*?}
it was in a replace string.replace statement.

Put your regex into Regex101.com
At the bottom is a guide titled Your regular expression explained

According to RegexBuddy this is what it dose:
Match either the regular expression below (attempting the next alternative only if this one fails) «<(.|\n)*?>»
Match the character “<” literally «<»
Match the regular expression below and capture its match into backreference number 1 «(.|\n)*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Note: You repeated the capturing group itself. The group will capture only the last iteration. Put a capturing group around the repeated group to capture all iterations. «*?»
Match either the regular expression below (attempting the next alternative only if this one fails) «.»
Match any single character that is not a line break character «.»
Or match regular expression number 2 below (the entire group fails if this one fails to match) «\n»
Match a line feed character «\n»
Match the character “>” literally «>»
Or match regular expression number 2 below (the entire match attempt fails if this one fails to match) «{(.|\n)*?}»
Match the character “{” literally «{»
Match the regular expression below and capture its match into backreference number 2 «(.|\n)*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Note: You repeated the capturing group itself. The group will capture only the last iteration. Put a capturing group around the repeated group to capture all iterations. «*?»
Match either the regular expression below (attempting the next alternative only if this one fails) «.»
Match any single character that is not a line break character «.»
Or match regular expression number 2 below (the entire group fails if this one fails to match) «\n»
Match a line feed character «\n»
Match the character “}” literally «}»
The Matches are:
<>
<...>
{}
{...}
when ... is any text

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex to match pattern consisting of several groups - c#

Related

Regular expression to match capture group not preceded by certain characters

Regular expression in RegularExpressionAttribute behavior

Cannot match parentheses in regex group

Regular Expression that matches on values after a pipe in between brackets

Regex - documentation

Categories

Resources