Regex expression has me stumped - c#

Hi trying to work out a regex problem
I have a formula parser that needs to get the strings within some tags, the input string will look like this:
[//part1/part2/abc]+[/def]+[ghi]
All I want returned is three groups like this:
abc
def
ghi
I have a partially working regex that gets me three groups, and the strings between the square brackets, but I just can't get rid of the prefixed path.
If there is a path, it will always use forward slashes.
\[(.*?)\]
Can anyone please help?

Try below regex to capture one or more word characters before the ] bracket,
(\w+)\]
DEMO

You can try this:
#"\[(?:[^]/]*/)*([^]/]+)]"
Where the optional non capturing group will match all that ends with a slash.

Direct Match (No Capture)
This regex is a bit more complex because rather than capturing what you want to Group 1, we are matching it directly.
The regex also validates that we are inside the [brackets]:
(?x) # free-spacing mode
(?<= # look behind: we should see
\[ # an opening bracket, then
(?:(?:[^/\]]*/+)+)? # optionally, one or more series of
# non-slashes, non-closing brackets followed by slashes
) # end lookbehind
[^/\]]* # this is what we want to match: any character that is not a / or a ]
(?=[^/]*\]) # lookahead: we should see no slashes, then a closing ]
See demo.
You can actually use it in this free-spacing mode, which makes it easy to maintain and understand later. The explanation is in the comments.
In C# code:
Here is one way to use this regex in C#:
Regex yourRegex = new Regex(#"(?x) #free-spacing mode
(?<= # look behind: we should see
\[ # an opening bracket, then
(?:(?:[^/\]]*/+)+)? # optionally, one or more series of
# non-slashes, non-closing brackets followed by slashes
) # end lookbehind
[^/\]]* # this is what we want to match: any character that is not a / or a ]
(?=[^/]*\]) # lookahead: we should see no slashes, then a closing ]
");
allMatchResults = yourRegex.Matches(yourstring);
if (allMatchResults.Count > 0) {
// Access individual matches using allMatchResults.Item[]
}

Related

Regular Expression that matches on values after a pipe in between brackets

I'm still learning a lot about regex, so please forgive any naivety.
I've been using this site to test:
http://www.systemtextregularexpressions.com/regex.match
Basically, I'm having issues writing a regular expression that will match on any value after a pipe in between brackets.
Given an example string of:
"<div> \n [dont1.dont2|match1|match2] |dont3 [dont4] dont5. \n </div>"
Expected output would be a collection:
match1,
match2
The closest I've been able to get so far is:
(?!\[.*(\|)\])(?:\|)([\w-_.,:']*)
Above gives me the values, including the pipes, and dont3.
I've also tried this guy:
\|(.*(?=\]))
but it outputs:
|match1|match2
Here's one way of doing it:
(?<=\[[^\]]*\|)[^\]|]*
Here's the meaning of the pattern:
(?<=\[[^\]]*\|) - Lookbehind expression to ensure that any match must be preceded by an open bracket, followed by any number of non-close-bracket characters, followed by a pipe character
(?<= ... ) - Declares a lookbehind expression. Something matching the lookbehind must immediately precede the text in order for it the match. However, the part matched by the lookbehind is not included in the resulting match.
\[ - Matches an open bracket character
[^\]]* - Matches any number of non-close-bracket characters
\| - Matches a pipe character
[^\]|]* - Matches any number of characters which are neither close brackets nor pipe characters.
The lookbehind is greedy, so it will allow for any number of pipes between the open bracket and the matching text.
try this:
\[.*?(?:\|(?<mydata>.*?))+\]
note: the online tool will only show you the last capture inside a quantifed () for a given match, but .NET will remember each capture of a group that matches multiple times
Try this:
^<div>\s*[^|]+|([^|]+)|([^|]+)

regex to get substring before substring

I have a string like following,
hi,hello,-LSB-,ASPECT,-RSB-,you
I want to extract sub-string that comes before -LSB-,ASPECT, till comma, hello in this case.
I have written regular expression like
\b\w+[/-/,LSB/-/,ASPECT]
however it extracts entire substring before and inclusing-LSB-,ASPECT, till start like,
hi,hello,-LSB-,ASPECT
Any clue??
The regex for this (using a positive lookahead assertion) would be
[^,]*(?=,-LSB-,ASPECT,)
Explanation:
[^,]* # Match any number of characters except commas
(?= # until the following regex can be matched:
,-LSB-,ASPECT, # the literal text ",-LSB-,ASPECT,".
) # (End of lookahead assertion)
Careful, square brackets create a character class which you don't want in this case.
Live demo
Try this:
(\w+),-LSB-,ASPECT

RegEx to match a number in the second line

I need a regex to match a number in the second line. Similar input is like this:
^C1.1
xC20
SS3
M 4
Decimal pattern (-?\d+(\.\d+)?) matches all numbers and second number can be get in a loop on the code behind but I need a regular expression to get directly the number in the second line.
/^[^\r\n]*\r?\n\D*?(-?\d+(\.\d+)?)/
This operates by capturing a single line at the beginning of the input:
^ Beginning of the string
[^\r\n]* Anything that isn't a line terminator
\r?\n A newline, optionally preceded by a carriage return
Then all the non digit characters, then your numbers.
Since you've now repeatedly changed your needs, try this on for size:
/(?<=\n\D*)-?\d+(\.\d+)?/
I was able to capture it with this regex.
.*\n\D*(\d*).*\n
Check out group 1 of anything that this matches:
^.*?\r\n.*?(\d+)
If that doesn't work, try this:
^.*?\r\n.*?(\d+)
Both are with multiline NOT set...
I would probably use the captured group in /^.*?\r?\n.*?(-?\d+(?:\.\d+)?)/ where…
^ # beginning of string
.*? # anything...
\r?\n # followed by a new line
.*? # anything...
( # followed by...
-? # an optional negative sign (minus)
\d+ # a number
(?: # -this part not captured explicitly-
\.\d+ # a dot and a number
)? # -and is optional-
)
If it is a flavor that supports lookbehind then there are other alternatives.

Seeking some C# RegEx help

I am trying to create a RegEx expression that will successfully parse the following line:
"57" "testing123" 82 16 # 13 26 blah blah
What I want is to be able to do is identify the numbers in the line. Currently, what I'm using is this:
[0-9]+
which parses fine. However, where it gets tricky is if the number is in quotes, like "57" is or like "testing123" is, I do not want it to match.
In addition to that, anything after the hash sign (the '#"), I do not want to match anything at all after the hash sign.
So in this example, the matches I should be getting are "82" and "16". Nothing else should match.
Any help on this would be appreciated.
It should be easier for you to build 3 different regexes, and then create the logic that combines them:
Check, whether the string has #, and ignore everything after it.
Check, for all the matches of "\d+", and ignore all of them
Check everything that's left, whether it matches [0-9]+
.Net regular expression can rather easily parse this string. The following pattern should match everything until the comment:
\A # Start of the string
(?>
(?<Quoted> # A quoted string
"" # Open quotes
[^""\\]* # non quotes or backslashes
(?:\\.[^""\\]*)* # but allow escaped characters
"" # Close quotes
)
|
(?<Number> # A number
\d+ # some digits
)
|
\s+ # Whitespace separator
)*
If you also want to match the comment, add:
(?<Comment>
\# .*
)?
\z
You can get your numbers in a single Match, using all captures of the "Number" group:
Match parsed = Regex.Match(s, pattern, RegexOptions.IgnorePatternWhitespace);
CaptureCollection numbers = parsed.Groups["Number"].Captures;
Missing from this pattern is mainly unquoted string tokens, such as 4 8 this 15that, which can add some complexity, depending on how we'd want it to work.

Regular expression to find separator dots in formula

The C# expression library I am using will not directly support my table/field parameter syntax:
The following are table/field parameter names that are not directly supported:
TableName1.FieldName1
[TableName1].[FieldName1]
[Table Name 1].[Field Name 1]
It accepts alphanumeric parameters without spaces, or most characters enclosed within square brackets. I would like to use C# regular expressions to replace the dot separators and neighboring brackets to a different delimiter, so the results would be as follows:
[TableName1|FieldName1]
[TableName1|FieldName1]
[Table Name 1|Field Name 1]
I also need to skip any string literals within single quotes, like:
'TableName1.FieldName1'
And, of course, ignore any numeric literals like:
12345.6789
EDIT: Thank you for your feedback on improving my question. Hopefully it is clearer now.
I've written a completely new answer, now that the problem is clarified:
You can do this in a single regex. It is quite bulletproof, I think, but as you can see, it's not exactly self-explanatory, which is why I've commented it liberally. Hope it makes sense.
You're lucky that .NET allows re-use of named capturing groups, otherwise you would have had to do this in several steps.
resultString = Regex.Replace(subjectString,
#"(?: # Either match...
(?<before> # (and capture into backref <before>)
(?=\w*\p{L}) # (as long as it contains at least one letter):
\w+ # one or more alphanumeric characters,
) # (End of capturing group <before>).
\. # then a literal dot,
(?<after> # (now capture again, into backref <after>)
(?=\w*\p{L}) # (as long as it contains at least one letter):
\w+ # one or more alphanumeric characters.
) # (End of capturing group <after>) and end of match.
| # Or:
\[ # Match a literal [
(?<before> # (now capture into backref <before>)
[^\]]+ # one or more characters except ]
) # (End of capturing group <before>).
\]\.\[ # Match literal ].[
(?<after> # (capture into backref <after>)
[^\]]+ # one or more characters except ]
) # (End of capturing group <after>).
\] # Match a literal ]
) # End of alternation. The match is now finished, but
(?= # only if the rest of the line matches either...
[^']*$ # only non-quote characters
| # or
[^']*'[^']*' # contains an even number of quote characters
[^']* # plus any number of non-quote characters
$ # until the end of the line.
) # End of the lookahead assertion.",
"[${before}|${after}]", RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);

Categories