How to find in string all matches

How to find in string all matches - c#

Assume that I have the following string:
xx##a#11##yyy##bb#2##z
Im trying to retrieve all occurrence of ##something#somethingElse##
(In my string I want to have 2 matches: ##a#11## and ##bb#2##)
I tried to get all matches using
Regex.Matches(MyString, ".*(##.*#.*##).*")
but it retrieves one match which is the whole row.
How can I get all matches from this string? Thanks.

Since you have .* at the start and end of your pattern, you only get the whole line match. Besides, .* in-between #s in your pattern is too greedy, and would grab all the expected matches into 1 match when encountered on a single line.
You may use
var results = Regex.Matches(MyString, "##[^#]*#[^#]*##")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
See the regex demo
NOTE: If there must be at least 1 char in between ## and #, and # and ##, replace * quantifier (matching 0+ occurrences) with + quantifier (matching 1+ occurrences).
NOTE2: To avoid matches inside ####..#....#####, you may add lookarounds: "(?<!#)##[^#]+#[^#]+##(?!#)"
Pattern details:
## - 2 # symbols
[^#]* / [^#]+ - a negated character class matching 0+ chars (or 1+ chars) other than #
# - a single #
[^#]* / [^#]+ - 0+ (or 1+) chars other than #
## - double # symbol.
BONUS: To get the contents inside ## and ##, use a capturing group, a pair of unescaped (...) around the part of the pattern you need to extract, and grab Match.Groups[1].Values:
var results = Regex.Matches(MyString, #"##([^#]*#[^#]*)##")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();

Regex101
Regex.Matches(MyString, "(##[^#]+#[^#]+##)")
(##[^#]+#[^#]+##)
Description
1st Capturing Group (##[^#]+#[^#]+##)
## matches the characters ## literally (case sensitive)
Match a single character not present in the list below [^#]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
# matches the character # literally (case sensitive)
# matches the character # literally (case sensitive)
Match a single character not present in the list below [^#]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
# matches the character # literally (case sensitive)
## matches the characters ## literally (case sensitive)
Debuggex Demo

Related

Find regex pattern match string have multiple condition?

I have some strings formatted as follows:
1=case1,case2,..caseN;2=case1,..,caseN;3=case1, ..,caseN
Note: comma ";" is used to separate cases and case1, case2 are anything like strings, number doesn't matter their type.
I want to find regex pattern to match string
1=home,house;2=abc;3=2019,2021
however, it will not match the following:
1=home,;2=abc;3=2019,2021 (Excess comma mark at case 1)
1=;2=abc,2012;3= (must 1=..; not 1=;)
1=home,age;2 (must 2=.. not 2)
2=home;;3=sea (must ;3 not ;;3)
4=flower;k3=sea (must 3= , not k3)
I tried with the pattern: (\d+={1}[^;]+;). However, it will match if the backstring is not.
Please show me the way.
Many thanks!

Maybe this pattern helps you out:
^\b(?:(?:^|;)\d+=[^,;]+(?:,[^,;]+)*)+$
See the Online Demo
^ - Start string ancor.
\b - Word-boundary.
(?: - Opening 1st non-capture group.
(?:- Opening 2nd non-capture group.
^|; - Alternation between start string ancor or semi-colon.
) - Closing 2nd non-capture group.
\d+= - One or more digits followed by a =.
[^,;]+ - Negated character class, any character other than comma or semicolon one or more times.
(?: - Opening 3rd non-capture group.
, - A comma.
[^,;]+ - Negated character class, any character other than comma or semicolon one or more times.
)* - Close 3rd non-capture group and make it match zero or more times.
)+ - Close 1st non-capture group and make sure it's matches one or more times.
$ - End string ancor.
Note: I went with a negated character class since you mentioned "case1, case2 are anything like strings, number doesn't matter their type", therefor I read there can be spaces, special characters or any kind other than comma and semicolon.

This works on regex101
^(?:\d=(?:\w{1,},)*(?:\w{1,});)*(?:\d=(?:\w{1,},)*\w{1,})$

^(?:\d+=[a-z\d]+(?:,[a-z\d]+)*(?:;|$))+$
Demo
^ : match beginning of string
(?: : begin nc group
\d+=[a-z\d]+ : match 1+ digits, then '=' then 1+ lc letters or digits
(?:,[a-z\d]+) : match ',' then 1+ lc letters or digits in nc group
* : execute nc group 0+ times
(?:;|$) : match ';' or end of string
)+ : end nc group and execute 1+ times
$ : match end of string

I don't know if c# supports recursive pattern, but, if it does, use:
^(\d+=\w+(?:,\w+)*)(?:;(?1))*$
if it doesn't:
^\d+=\w+(?:,\w+)*(?:;\d+=\w+(?:,\w+)*)*$
Demo & explanation

Regular Expression to match dot separated list allowing one or more word

I need an regex that allows an input string with one or more words, but the list must be separated with only a dot. For example:
test = OK
test.test = OK
test.test.1 = OK
test#test = NO
test_test = NO
test-test1 = NO
test. = NO
My regex works, but accepts also other symbols, such as -
^[a-z0-9*.\-_\.:]+$

If you want to ensure that it doesn't begin or end with a dot, use ^[a-zA-Z0-9]+(?:\.[a-zA-Z0-9]+)*$
Explanation:
^ - Match beginning of input
[a-zA-Z0-9]+ - Match alphanumeric sequence
(?: - Beginning of a non-capturing group
\. - Match a single .
[a-zA-Z0-9]+ - Match a alphanumeric sequence
) - Close the group
* - Repeat previous group any number of times
$ - Match end of input
You can also replace [a-zA-Z0-9] with [^\W_], as this will match any character that's not non-word and also not underscore. Basically \w minus the _ character.

Try ^[A-Za-z0-9]+(\.[A-Za-z0-9]+)*$
[A-Za-z0-9]+ match a word (letters or numbers)
(\.[A-Za-z0-9]+)* - match any following words separated by a dot
Demo

How to extract text with line continuation using Regex?

How can I extract the following from the source string that uses line continuation character "_" using Regex. Note, the line continuation character must be the last character on that line. Also, the search should start from the end of the string and terminate at the first "(" encountered. That's because I am only interested what's happening at the end of the text.
Wanted Output:
var1, _
var2, _
var3
Source:
...
Func(var1, _
var2, _
var3

Try this
(?<=Func\()(?<match>(?:[^\r\n]+_\r\n)+[^\r\n]+)
Explanation
#"
(?<= # Assert that the regex below can be matched, with the match ending at this position (positive lookbehind)
Func # Match the characters “Func” literally
\( # Match the character “(” literally
)
(?<match> # Match the regular expression below and capture its match into backreference with name “match”
(?: # Match the regular expression below
[^\r\n] # Match a single character NOT present in the list below
# A carriage return character
# A line feed character
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
_ # Match the character “_” literally
\r # Match a carriage return character
\n # Match a line feed character
)+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
[^\r\n] # Match a single character NOT present in the list below
# A carriage return character
# A line feed character
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
"

C# Regular Expression excluding a string

I got a collection of string and all i want for regex is to collect all started with http..
href="http://www.test.com/cat/1-one_piece_episodes/"href="http://www.test.com/cat/2-movies_english_subbed/"href="http://www.test.com/cat/3-english_dubbed/"href="http://www.exclude.com"
this is my regular expression pattern..
href="(.*?)[^#]"
and return this
href="http://www.test.com/cat/1-one_piece_episodes/"
href="http://www.test.com/cat/2-movies_english_subbed/"
href="http://www.xxxx.com/cat/3-english_dubbed/"
href="http://www.exclude.com"
what is the pattern for excluding the last match.. or excluding matches that has the exclude domain inside like href="http://www.exclude.com"
EDIT:
for multiple exclusion
href="((?:(?!"|\bexclude\b|\bxxxx\b).)*)[^#]"

#ridgerunner and me would change the regex to:
href="((?:(?!\bexclude\b)[^"])*)[^#]"
It matches all href attributes as long as they don't end in # and don't contain the word exclude.
Explanation:
href=" # Match href="
( # Capture...
(?: # the following group:
(?! # Look ahead to check that the next part of the string isn't...
\b # the entire word
exclude # exclude
\b # (\b are word boundary anchors)
) # End of lookahead
[^"] # If successful, match any character except for a quote
)* # Repeat as often as possible
) # End of capturing group 1
[^#]" # Match a non-# character and the closing quote.
To allow multiple "forbidden words":
href="((?:(?!\b(?:exclude|this|too)\b)[^"])*)[^#]"

Your input doesn't look like a valid string (unless you escape the quotes in them) but you can do it without regex too:
string input = "href=\"http://www.test.com/cat/1-one_piece_episodes/\"href=\"http://www.test.com/cat/2-movies_english_subbed/\"href=\"http://www.test.com/cat/3-english_dubbed/\"href=\"http://www.exclude.com\"";
List<string> matches = new List<string>();
foreach(var match in input.split(new string[]{"href"})) {
if(!match.Contains("exclude.com"))
matches.Add("href" + match);
}

Will this do the job?
href="(?!http://[^/"]+exclude.com)(.*?)[^#]"

Regular expression to find separator dots in formula

The C# expression library I am using will not directly support my table/field parameter syntax:
The following are table/field parameter names that are not directly supported:
TableName1.FieldName1
[TableName1].[FieldName1]
[Table Name 1].[Field Name 1]
It accepts alphanumeric parameters without spaces, or most characters enclosed within square brackets. I would like to use C# regular expressions to replace the dot separators and neighboring brackets to a different delimiter, so the results would be as follows:
[TableName1|FieldName1]
[TableName1|FieldName1]
[Table Name 1|Field Name 1]
I also need to skip any string literals within single quotes, like:
'TableName1.FieldName1'
And, of course, ignore any numeric literals like:
12345.6789
EDIT: Thank you for your feedback on improving my question. Hopefully it is clearer now.

I've written a completely new answer, now that the problem is clarified:
You can do this in a single regex. It is quite bulletproof, I think, but as you can see, it's not exactly self-explanatory, which is why I've commented it liberally. Hope it makes sense.
You're lucky that .NET allows re-use of named capturing groups, otherwise you would have had to do this in several steps.
resultString = Regex.Replace(subjectString,
#"(?: # Either match...
(?<before> # (and capture into backref <before>)
(?=\w*\p{L}) # (as long as it contains at least one letter):
\w+ # one or more alphanumeric characters,
) # (End of capturing group <before>).
\. # then a literal dot,
(?<after> # (now capture again, into backref <after>)
(?=\w*\p{L}) # (as long as it contains at least one letter):
\w+ # one or more alphanumeric characters.
) # (End of capturing group <after>) and end of match.
| # Or:
\[ # Match a literal [
(?<before> # (now capture into backref <before>)
[^\]]+ # one or more characters except ]
) # (End of capturing group <before>).
\]\.\[ # Match literal ].[
(?<after> # (capture into backref <after>)
[^\]]+ # one or more characters except ]
) # (End of capturing group <after>).
\] # Match a literal ]
) # End of alternation. The match is now finished, but
(?= # only if the rest of the line matches either...
[^']*$ # only non-quote characters
| # or
[^']*'[^']*' # contains an even number of quote characters
[^']* # plus any number of non-quote characters
$ # until the end of the line.
) # End of the lookahead assertion.",
"[${before}|${after}]", RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to find in string all matches - c#

Related

Find regex pattern match string have multiple condition?

Regular Expression to match dot separated list allowing one or more word

How to extract text with line continuation using Regex?

C# Regular Expression excluding a string

Regular expression to find separator dots in formula

Categories

Resources