The C# expression library I am using will not directly support my table/field parameter syntax:
The following are table/field parameter names that are not directly supported:
TableName1.FieldName1
[TableName1].[FieldName1]
[Table Name 1].[Field Name 1]
It accepts alphanumeric parameters without spaces, or most characters enclosed within square brackets. I would like to use C# regular expressions to replace the dot separators and neighboring brackets to a different delimiter, so the results would be as follows:
[TableName1|FieldName1]
[TableName1|FieldName1]
[Table Name 1|Field Name 1]
I also need to skip any string literals within single quotes, like:
'TableName1.FieldName1'
And, of course, ignore any numeric literals like:
12345.6789
EDIT: Thank you for your feedback on improving my question. Hopefully it is clearer now.
I've written a completely new answer, now that the problem is clarified:
You can do this in a single regex. It is quite bulletproof, I think, but as you can see, it's not exactly self-explanatory, which is why I've commented it liberally. Hope it makes sense.
You're lucky that .NET allows re-use of named capturing groups, otherwise you would have had to do this in several steps.
resultString = Regex.Replace(subjectString,
#"(?: # Either match...
(?<before> # (and capture into backref <before>)
(?=\w*\p{L}) # (as long as it contains at least one letter):
\w+ # one or more alphanumeric characters,
) # (End of capturing group <before>).
\. # then a literal dot,
(?<after> # (now capture again, into backref <after>)
(?=\w*\p{L}) # (as long as it contains at least one letter):
\w+ # one or more alphanumeric characters.
) # (End of capturing group <after>) and end of match.
| # Or:
\[ # Match a literal [
(?<before> # (now capture into backref <before>)
[^\]]+ # one or more characters except ]
) # (End of capturing group <before>).
\]\.\[ # Match literal ].[
(?<after> # (capture into backref <after>)
[^\]]+ # one or more characters except ]
) # (End of capturing group <after>).
\] # Match a literal ]
) # End of alternation. The match is now finished, but
(?= # only if the rest of the line matches either...
[^']*$ # only non-quote characters
| # or
[^']*'[^']*' # contains an even number of quote characters
[^']* # plus any number of non-quote characters
$ # until the end of the line.
) # End of the lookahead assertion.",
"[${before}|${after}]", RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);
Related
I have some strings formatted as follows:
1=case1,case2,..caseN;2=case1,..,caseN;3=case1, ..,caseN
Note: comma ";" is used to separate cases and case1, case2 are anything like strings, number doesn't matter their type.
I want to find regex pattern to match string
1=home,house;2=abc;3=2019,2021
however, it will not match the following:
1=home,;2=abc;3=2019,2021 (Excess comma mark at case 1)
1=;2=abc,2012;3= (must 1=..; not 1=;)
1=home,age;2 (must 2=.. not 2)
2=home;;3=sea (must ;3 not ;;3)
4=flower;k3=sea (must 3= , not k3)
I tried with the pattern: (\d+={1}[^;]+;). However, it will match if the backstring is not.
Please show me the way.
Many thanks!
Maybe this pattern helps you out:
^\b(?:(?:^|;)\d+=[^,;]+(?:,[^,;]+)*)+$
See the Online Demo
^ - Start string ancor.
\b - Word-boundary.
(?: - Opening 1st non-capture group.
(?:- Opening 2nd non-capture group.
^|; - Alternation between start string ancor or semi-colon.
) - Closing 2nd non-capture group.
\d+= - One or more digits followed by a =.
[^,;]+ - Negated character class, any character other than comma or semicolon one or more times.
(?: - Opening 3rd non-capture group.
, - A comma.
[^,;]+ - Negated character class, any character other than comma or semicolon one or more times.
)* - Close 3rd non-capture group and make it match zero or more times.
)+ - Close 1st non-capture group and make sure it's matches one or more times.
$ - End string ancor.
Note: I went with a negated character class since you mentioned "case1, case2 are anything like strings, number doesn't matter their type", therefor I read there can be spaces, special characters or any kind other than comma and semicolon.
This works on regex101
^(?:\d=(?:\w{1,},)*(?:\w{1,});)*(?:\d=(?:\w{1,},)*\w{1,})$
^(?:\d+=[a-z\d]+(?:,[a-z\d]+)*(?:;|$))+$
Demo
^ : match beginning of string
(?: : begin nc group
\d+=[a-z\d]+ : match 1+ digits, then '=' then 1+ lc letters or digits
(?:,[a-z\d]+) : match ',' then 1+ lc letters or digits in nc group
* : execute nc group 0+ times
(?:;|$) : match ';' or end of string
)+ : end nc group and execute 1+ times
$ : match end of string
I don't know if c# supports recursive pattern, but, if it does, use:
^(\d+=\w+(?:,\w+)*)(?:;(?1))*$
if it doesn't:
^\d+=\w+(?:,\w+)*(?:;\d+=\w+(?:,\w+)*)*$
Demo & explanation
I need a regex to match a number in the second line. Similar input is like this:
^C1.1
xC20
SS3
M 4
Decimal pattern (-?\d+(\.\d+)?) matches all numbers and second number can be get in a loop on the code behind but I need a regular expression to get directly the number in the second line.
/^[^\r\n]*\r?\n\D*?(-?\d+(\.\d+)?)/
This operates by capturing a single line at the beginning of the input:
^ Beginning of the string
[^\r\n]* Anything that isn't a line terminator
\r?\n A newline, optionally preceded by a carriage return
Then all the non digit characters, then your numbers.
Since you've now repeatedly changed your needs, try this on for size:
/(?<=\n\D*)-?\d+(\.\d+)?/
I was able to capture it with this regex.
.*\n\D*(\d*).*\n
Check out group 1 of anything that this matches:
^.*?\r\n.*?(\d+)
If that doesn't work, try this:
^.*?\r\n.*?(\d+)
Both are with multiline NOT set...
I would probably use the captured group in /^.*?\r?\n.*?(-?\d+(?:\.\d+)?)/ where…
^ # beginning of string
.*? # anything...
\r?\n # followed by a new line
.*? # anything...
( # followed by...
-? # an optional negative sign (minus)
\d+ # a number
(?: # -this part not captured explicitly-
\.\d+ # a dot and a number
)? # -and is optional-
)
If it is a flavor that supports lookbehind then there are other alternatives.
I did not found a regex for my problem. There are always example-regex for escaping with back-slash.
But I need escaping by doubling the enclosing-character.
Example: 'o''reilly'
Result: o'reilly
'(?:''|[^']*)*'
will match a quote-delimited string that may contain double-escaped quotes. So that's your regex to find those strings.
Explanation:
' # Match a single quote.
(?: # Either match... (use (?> instead of (?: if you can)
'' # a doubled quote
| # or
[^']* # anything that's not a quote
)* # any number of times.
' # Match a single quote.
To now remove the quotes correctly, you could do it in two steps:
First, search for (?<!')'(?!') to find all single quotes; replace them with nothing.
Explanation:
(?<!') # Assert that the previous character (if present) isn't a quote
' # Match a quote
(?!') # Assert that the next character (if present) isn't a quote
Second, search for '' and replace all with '.
I got a collection of string and all i want for regex is to collect all started with http..
href="http://www.test.com/cat/1-one_piece_episodes/"href="http://www.test.com/cat/2-movies_english_subbed/"href="http://www.test.com/cat/3-english_dubbed/"href="http://www.exclude.com"
this is my regular expression pattern..
href="(.*?)[^#]"
and return this
href="http://www.test.com/cat/1-one_piece_episodes/"
href="http://www.test.com/cat/2-movies_english_subbed/"
href="http://www.xxxx.com/cat/3-english_dubbed/"
href="http://www.exclude.com"
what is the pattern for excluding the last match.. or excluding matches that has the exclude domain inside like href="http://www.exclude.com"
EDIT:
for multiple exclusion
href="((?:(?!"|\bexclude\b|\bxxxx\b).)*)[^#]"
#ridgerunner and me would change the regex to:
href="((?:(?!\bexclude\b)[^"])*)[^#]"
It matches all href attributes as long as they don't end in # and don't contain the word exclude.
Explanation:
href=" # Match href="
( # Capture...
(?: # the following group:
(?! # Look ahead to check that the next part of the string isn't...
\b # the entire word
exclude # exclude
\b # (\b are word boundary anchors)
) # End of lookahead
[^"] # If successful, match any character except for a quote
)* # Repeat as often as possible
) # End of capturing group 1
[^#]" # Match a non-# character and the closing quote.
To allow multiple "forbidden words":
href="((?:(?!\b(?:exclude|this|too)\b)[^"])*)[^#]"
Your input doesn't look like a valid string (unless you escape the quotes in them) but you can do it without regex too:
string input = "href=\"http://www.test.com/cat/1-one_piece_episodes/\"href=\"http://www.test.com/cat/2-movies_english_subbed/\"href=\"http://www.test.com/cat/3-english_dubbed/\"href=\"http://www.exclude.com\"";
List<string> matches = new List<string>();
foreach(var match in input.split(new string[]{"href"})) {
if(!match.Contains("exclude.com"))
matches.Add("href" + match);
}
Will this do the job?
href="(?!http://[^/"]+exclude.com)(.*?)[^#]"
I am trying to create a RegEx expression that will successfully parse the following line:
"57" "testing123" 82 16 # 13 26 blah blah
What I want is to be able to do is identify the numbers in the line. Currently, what I'm using is this:
[0-9]+
which parses fine. However, where it gets tricky is if the number is in quotes, like "57" is or like "testing123" is, I do not want it to match.
In addition to that, anything after the hash sign (the '#"), I do not want to match anything at all after the hash sign.
So in this example, the matches I should be getting are "82" and "16". Nothing else should match.
Any help on this would be appreciated.
It should be easier for you to build 3 different regexes, and then create the logic that combines them:
Check, whether the string has #, and ignore everything after it.
Check, for all the matches of "\d+", and ignore all of them
Check everything that's left, whether it matches [0-9]+
.Net regular expression can rather easily parse this string. The following pattern should match everything until the comment:
\A # Start of the string
(?>
(?<Quoted> # A quoted string
"" # Open quotes
[^""\\]* # non quotes or backslashes
(?:\\.[^""\\]*)* # but allow escaped characters
"" # Close quotes
)
|
(?<Number> # A number
\d+ # some digits
)
|
\s+ # Whitespace separator
)*
If you also want to match the comment, add:
(?<Comment>
\# .*
)?
\z
You can get your numbers in a single Match, using all captures of the "Number" group:
Match parsed = Regex.Match(s, pattern, RegexOptions.IgnorePatternWhitespace);
CaptureCollection numbers = parsed.Groups["Number"].Captures;
Missing from this pattern is mainly unquoted string tokens, such as 4 8 this 15that, which can add some complexity, depending on how we'd want it to work.