Way to not include something in regex capture group - c#

Given:
var input = "test <123>";
Regex.Matches(input, "<.*?>");
Result:
<123>
Gives me the result I want but includes the angle brackets. Which is ok because I can easily do a search and replace. I was just wondering if there was a way to include that in the expression?

You need to use a capturing group:
var input = "test <123>";
var results = Regex.Matches(input, "<(.*?)>")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
The m.Groups[1].Value lets you get the capturing group #1 value.
And a better, more efficient regex can be <([^>]*)> (it matches <, then matches and captures into Group 1 any zero or more chars other than > and then just matches >). See the regex demo:

Related

RegExp- How to get words from entire document that match the expression [duplicate]

I'm trying to extract values from a string which are between << and >>. But they could happen multiple times.
Can anyone help with the regular expression to match these;
this is a test for <<bob>> who like <<books>>
test 2 <<frank>> likes nothing
test 3 <<what>> <<on>> <<earth>> <<this>> <<is>> <<too>> <<much>>.
I then want to foreach the GroupCollection to get all the values.
Any help greatly received.
Thanks.
Use a positive look ahead and look behind assertion to match the angle brackets, use .*? to match the shortest possible sequence of characters between those brackets. Find all values by iterating the MatchCollection returned by the Matches() method.
Regex regex = new Regex("(?<=<<).*?(?=>>)");
foreach (Match match in regex.Matches(
"this is a test for <<bob>> who like <<books>>"))
{
Console.WriteLine(match.Value);
}
LiveDemo in DotNetFiddle
While Peter's answer is a good example of using lookarounds for left and right hand context checking, I'd like to also add a LINQ (lambda) way to access matches/groups and show the use of simple numeric capturing groups that come handy when you want to extract only a part of the pattern:
using System.Linq;
using System.Collections.Generic;
using System.Text.RegularExpressions;
// ...
var results = Regex.Matches(s, #"<<(.*?)>>", RegexOptions.Singleline)
.Cast<Match>()
.Select(x => x.Groups[1].Value);
Same approach with Peter's compiled regex where the whole match value is accessed via Match.Value:
var results = regex.Matches(s).Cast<Match>().Select(x => x.Value);
Note:
<<(.*?)>> is a regex matching <<, then capturing any 0 or more chars as few as possible (due to the non-greedy *? quantifier) into Group 1 and then matching >>
RegexOptions.Singleline makes . match newline (LF) chars, too (it does not match them by default)
Cast<Match>() casts the match collection to a IEnumerable<Match> that you may further access using a lambda
Select(x => x.Groups[1].Value) only returns the Group 1 value from the current x match object
Note you may further create a list of array of obtained values by adding .ToList() or .ToArray() after Select.
In the demo C# code, string.Join(", ", results) generates a comma-separated string of the Group 1 values:
var strs = new List<string> { "this is a test for <<bob>> who like <<books>>",
"test 2 <<frank>> likes nothing",
"test 3 <<what>> <<on>> <<earth>> <<this>> <<is>> <<too>> <<much>>." };
foreach (var s in strs)
{
var results = Regex.Matches(s, #"<<(.*?)>>", RegexOptions.Singleline)
.Cast<Match>()
.Select(x => x.Groups[1].Value);
Console.WriteLine(string.Join(", ", results));
}
Output:
bob, books
frank
what, on, earth, this, is, too, much
You can try one of these:
(?<=<<)[^>]+(?=>>)
(?<=<<)\w+(?=>>)
However you will have to iterate the returned MatchCollection.
Something like this:
(<<(?<element>[^>]*)>>)*
This program might be useful:
http://sourceforge.net/projects/regulator/

Retrieve different groups of values in a regex

I have this following string :
((1+2)*(4+3))
I would like to get the values exposed with parentheses separately through a Regex. These values must be in a array like string array.
For example :
Group 1 : ((1+2)*(4+3))
Group 2 : (1+2)
Group 3 : (4+3)
I have tried this Regex :
(?<content>\(.+\))
But she don't functional, because she keeps the group 1
You will have solutions that could allow me to manage this recursively?
You may get all overlapping substrings starting with ( and ending with ) and having any amount of balanced nested parentheses inside using
var result = Regex.Matches(s, #"(?=(\((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!)|)\)))").Cast<Match>().Select(x => x.Groups[1].Value);
See the regex demo online.
Regex details
The regex is a positive lookahead ((?=...)) that checks each position within a string and finds a match if its pattern matches. Since the pattern is enclosed with a capturing group ((...)) the value is stored in match.Groups[1] that you may retrieve once the match is found. \((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!)|)\) is a known pattern that matches nested balanced parentheses.
C# demo:
var str = "((1+2)*(4+3))";
var pattern = #"(?=(\((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!)|)\)))";
var result = Regex.Matches(str, pattern)
.Cast<Match>()
.Select(x => x.Groups[1].Value);
Console.WriteLine(string.Join("\n", result));
Output:
((1+2)*(4+3))
(1+2)
(4+3)

C# regex. Everything inside curly brackets{} and mod(%) charaters

I'm trying to get the values between {} and %% in a same Regex.
This is what I have till now. I can successfully get values individually for each but I was curious to learn about how can I combine both.
var regex = new Regex(#"%(.*?)%|\{([^}]*)\}");
String s = "This is a {test} %String%. %Stack% {Overflow}";
Expected answer for the above string
test
String
Stack
Overflow
Individual regex
#"%(.*?)%" gives me String and Stack
#"\{([^}]*)\}" gives me test and Overflow
Following is my code.
var regex = new Regex(#"%(.*?)%|\{([^}]*)\}");
var matches = regex.Matches(s);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[1].Value);
}
Similar to your regex. You can use Named Capturing Groups
String s = "This is a {test} %String%. %Stack% {Overflow}";
var list = Regex.Matches(s, #"\{(?<name>.+?)\}|%(?<name>.+?)%")
.Cast<Match>()
.Select(m => m.Groups["name"].Value)
.ToList();
If you want to learn how conditional expressions work, here is a solution using that kind of .NET regex capability:
(?:(?<p>%)|(?<b>{))(?<v>.*?)(?(p)%|})
See the regex demo
Here is how it works:
(?:(?<p>%)|(?<b>{)) - match and capture either Group "p" with % (percentage), or Group "b" (brace) with {
(?<v>.*?) - match and capture into Group "v" (value) any character (even a newline since I will be using RegexOptions.Singleline) zero or more times, but as few as possible (lazy matching with *? quantifier)
(?(p)%|}) - a conditional expression meaning: if "p" group was matched, match %, else, match }.
C# demo:
var s = "This is a {test} %String%. %Stack% {Overflow}";
var regex = "(?:(?<p>%)|(?<b>{))(?<v>.*?)(?(p)%|})";
var matches = Regex.Matches(s, regex, RegexOptions.Singleline);
// var matches_list = Regex.Matches(s, regex, RegexOptions.Singleline)
// .Cast<Match>()
// .Select(p => p.Groups["v"].Value)
// .ToList();
// Or just a demo writeline
foreach (Match match in matches)
Console.WriteLine(match.Groups["v"].Value);
Sometimes the capture is in group 1 and sometimes it's in group 2 because you have two pairs of parentheses.
Your original code will work if you do this instead:
Console.WriteLine(match.Groups[1].Value + match.Groups[2].Value);
because one group will be the empty string and the other will be the value you're interested in.
#"[\{|%](.*?)[\}|%]"
The idea being:
{ or %
anything
} or %
I think you should use a combination of conditional anda nested groups:
((\{(.*)\})|(%(.*)%))

Parse a string, keeping all of the matches in between given strings (multi-character delimiters)

This is very similar to the question here: How do I extract text that lies between parentheses (round brackets)? which I see this Regex code:
var matches = Regex.Matches("User name [[sales]] and [[anotherthing]]", #"\[\[([^)]*)\]\]");
But that doesn't seem to work with multi-character delimiters? This might not even be the correct way to go, but I am sure I am not the first to try this and I am drawing a blank here - anyone?
Your #"\[\[([^)]*)\]\]" pattern matches two consecutive [[, followed with zero or more characters other than a ) and then followed with two ]]. That means, if you have a ) inside [[...]], there won't be a match.
To deal with multicharacter-delimited substrings, you can use 2 things: either lazy dot matching, or unrolled patterns.
Note: to get multiple matches, use Regex.Matches as I wrote in my other answer.
1. Lazy dot solution:
var s = "User name [[sales]] and [[anotherthing]]";
var matches = Regex.Matches(s, #"\[{2}(.*?)]{2}", RegexOptions.Singleline)
.Cast<Match>()
.Select(p => p.Groups[1].Value)
.ToList();
See the regex demo. The RegexOptions.Singleline modifier is necessary for the . to match newline symbols.
2. Unrolled regex solution:
var s = "User name [[sales]] and [[anotherthing]]";
var matches = Regex.Matches(s, #"\[{2}([^]]*(?:](?!])[^]]*)*)]{2}")
.Cast<Match>()
.Select(p => p.Groups[1].Value)
.ToList();
With this one, RegexOptions.Singleline is not necessary, and it is much more efficient and faster.
See regex demo
Use Regex.Matches:
Searches the specified input string for all occurrences of a specified regular expression.
Sample code:
var matches = Regex.Matches("User name (sales) and (anotherthing)", #"\(([^)]*)\)")
.Cast<Match>()
.Select(p => p.Groups[1].Value)
.ToList();

RegEx Match multiple times in string

I'm trying to extract values from a string which are between << and >>. But they could happen multiple times.
Can anyone help with the regular expression to match these;
this is a test for <<bob>> who like <<books>>
test 2 <<frank>> likes nothing
test 3 <<what>> <<on>> <<earth>> <<this>> <<is>> <<too>> <<much>>.
I then want to foreach the GroupCollection to get all the values.
Any help greatly received.
Thanks.
Use a positive look ahead and look behind assertion to match the angle brackets, use .*? to match the shortest possible sequence of characters between those brackets. Find all values by iterating the MatchCollection returned by the Matches() method.
Regex regex = new Regex("(?<=<<).*?(?=>>)");
foreach (Match match in regex.Matches(
"this is a test for <<bob>> who like <<books>>"))
{
Console.WriteLine(match.Value);
}
LiveDemo in DotNetFiddle
While Peter's answer is a good example of using lookarounds for left and right hand context checking, I'd like to also add a LINQ (lambda) way to access matches/groups and show the use of simple numeric capturing groups that come handy when you want to extract only a part of the pattern:
using System.Linq;
using System.Collections.Generic;
using System.Text.RegularExpressions;
// ...
var results = Regex.Matches(s, #"<<(.*?)>>", RegexOptions.Singleline)
.Cast<Match>()
.Select(x => x.Groups[1].Value);
Same approach with Peter's compiled regex where the whole match value is accessed via Match.Value:
var results = regex.Matches(s).Cast<Match>().Select(x => x.Value);
Note:
<<(.*?)>> is a regex matching <<, then capturing any 0 or more chars as few as possible (due to the non-greedy *? quantifier) into Group 1 and then matching >>
RegexOptions.Singleline makes . match newline (LF) chars, too (it does not match them by default)
Cast<Match>() casts the match collection to a IEnumerable<Match> that you may further access using a lambda
Select(x => x.Groups[1].Value) only returns the Group 1 value from the current x match object
Note you may further create a list of array of obtained values by adding .ToList() or .ToArray() after Select.
In the demo C# code, string.Join(", ", results) generates a comma-separated string of the Group 1 values:
var strs = new List<string> { "this is a test for <<bob>> who like <<books>>",
"test 2 <<frank>> likes nothing",
"test 3 <<what>> <<on>> <<earth>> <<this>> <<is>> <<too>> <<much>>." };
foreach (var s in strs)
{
var results = Regex.Matches(s, #"<<(.*?)>>", RegexOptions.Singleline)
.Cast<Match>()
.Select(x => x.Groups[1].Value);
Console.WriteLine(string.Join(", ", results));
}
Output:
bob, books
frank
what, on, earth, this, is, too, much
You can try one of these:
(?<=<<)[^>]+(?=>>)
(?<=<<)\w+(?=>>)
However you will have to iterate the returned MatchCollection.
Something like this:
(<<(?<element>[^>]*)>>)*
This program might be useful:
http://sourceforge.net/projects/regulator/

Categories