I have this following string :
((1+2)*(4+3))
I would like to get the values exposed with parentheses separately through a Regex. These values must be in a array like string array.
For example :
Group 1 : ((1+2)*(4+3))
Group 2 : (1+2)
Group 3 : (4+3)
I have tried this Regex :
(?<content>\(.+\))
But she don't functional, because she keeps the group 1
You will have solutions that could allow me to manage this recursively?
You may get all overlapping substrings starting with ( and ending with ) and having any amount of balanced nested parentheses inside using
var result = Regex.Matches(s, #"(?=(\((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!)|)\)))").Cast<Match>().Select(x => x.Groups[1].Value);
See the regex demo online.
Regex details
The regex is a positive lookahead ((?=...)) that checks each position within a string and finds a match if its pattern matches. Since the pattern is enclosed with a capturing group ((...)) the value is stored in match.Groups[1] that you may retrieve once the match is found. \((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!)|)\) is a known pattern that matches nested balanced parentheses.
C# demo:
var str = "((1+2)*(4+3))";
var pattern = #"(?=(\((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!)|)\)))";
var result = Regex.Matches(str, pattern)
.Cast<Match>()
.Select(x => x.Groups[1].Value);
Console.WriteLine(string.Join("\n", result));
Output:
((1+2)*(4+3))
(1+2)
(4+3)
Related
Given:
var input = "test <123>";
Regex.Matches(input, "<.*?>");
Result:
<123>
Gives me the result I want but includes the angle brackets. Which is ok because I can easily do a search and replace. I was just wondering if there was a way to include that in the expression?
You need to use a capturing group:
var input = "test <123>";
var results = Regex.Matches(input, "<(.*?)>")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
The m.Groups[1].Value lets you get the capturing group #1 value.
And a better, more efficient regex can be <([^>]*)> (it matches <, then matches and captures into Group 1 any zero or more chars other than > and then just matches >). See the regex demo:
I'm trying to get the values between {} and %% in a same Regex.
This is what I have till now. I can successfully get values individually for each but I was curious to learn about how can I combine both.
var regex = new Regex(#"%(.*?)%|\{([^}]*)\}");
String s = "This is a {test} %String%. %Stack% {Overflow}";
Expected answer for the above string
test
String
Stack
Overflow
Individual regex
#"%(.*?)%" gives me String and Stack
#"\{([^}]*)\}" gives me test and Overflow
Following is my code.
var regex = new Regex(#"%(.*?)%|\{([^}]*)\}");
var matches = regex.Matches(s);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[1].Value);
}
Similar to your regex. You can use Named Capturing Groups
String s = "This is a {test} %String%. %Stack% {Overflow}";
var list = Regex.Matches(s, #"\{(?<name>.+?)\}|%(?<name>.+?)%")
.Cast<Match>()
.Select(m => m.Groups["name"].Value)
.ToList();
If you want to learn how conditional expressions work, here is a solution using that kind of .NET regex capability:
(?:(?<p>%)|(?<b>{))(?<v>.*?)(?(p)%|})
See the regex demo
Here is how it works:
(?:(?<p>%)|(?<b>{)) - match and capture either Group "p" with % (percentage), or Group "b" (brace) with {
(?<v>.*?) - match and capture into Group "v" (value) any character (even a newline since I will be using RegexOptions.Singleline) zero or more times, but as few as possible (lazy matching with *? quantifier)
(?(p)%|}) - a conditional expression meaning: if "p" group was matched, match %, else, match }.
C# demo:
var s = "This is a {test} %String%. %Stack% {Overflow}";
var regex = "(?:(?<p>%)|(?<b>{))(?<v>.*?)(?(p)%|})";
var matches = Regex.Matches(s, regex, RegexOptions.Singleline);
// var matches_list = Regex.Matches(s, regex, RegexOptions.Singleline)
// .Cast<Match>()
// .Select(p => p.Groups["v"].Value)
// .ToList();
// Or just a demo writeline
foreach (Match match in matches)
Console.WriteLine(match.Groups["v"].Value);
Sometimes the capture is in group 1 and sometimes it's in group 2 because you have two pairs of parentheses.
Your original code will work if you do this instead:
Console.WriteLine(match.Groups[1].Value + match.Groups[2].Value);
because one group will be the empty string and the other will be the value you're interested in.
#"[\{|%](.*?)[\}|%]"
The idea being:
{ or %
anything
} or %
I think you should use a combination of conditional anda nested groups:
((\{(.*)\})|(%(.*)%))
This is very similar to the question here: How do I extract text that lies between parentheses (round brackets)? which I see this Regex code:
var matches = Regex.Matches("User name [[sales]] and [[anotherthing]]", #"\[\[([^)]*)\]\]");
But that doesn't seem to work with multi-character delimiters? This might not even be the correct way to go, but I am sure I am not the first to try this and I am drawing a blank here - anyone?
Your #"\[\[([^)]*)\]\]" pattern matches two consecutive [[, followed with zero or more characters other than a ) and then followed with two ]]. That means, if you have a ) inside [[...]], there won't be a match.
To deal with multicharacter-delimited substrings, you can use 2 things: either lazy dot matching, or unrolled patterns.
Note: to get multiple matches, use Regex.Matches as I wrote in my other answer.
1. Lazy dot solution:
var s = "User name [[sales]] and [[anotherthing]]";
var matches = Regex.Matches(s, #"\[{2}(.*?)]{2}", RegexOptions.Singleline)
.Cast<Match>()
.Select(p => p.Groups[1].Value)
.ToList();
See the regex demo. The RegexOptions.Singleline modifier is necessary for the . to match newline symbols.
2. Unrolled regex solution:
var s = "User name [[sales]] and [[anotherthing]]";
var matches = Regex.Matches(s, #"\[{2}([^]]*(?:](?!])[^]]*)*)]{2}")
.Cast<Match>()
.Select(p => p.Groups[1].Value)
.ToList();
With this one, RegexOptions.Singleline is not necessary, and it is much more efficient and faster.
See regex demo
Use Regex.Matches:
Searches the specified input string for all occurrences of a specified regular expression.
Sample code:
var matches = Regex.Matches("User name (sales) and (anotherthing)", #"\(([^)]*)\)")
.Cast<Match>()
.Select(p => p.Groups[1].Value)
.ToList();
I've got an input string that looks like this:
level=<device[195].level>&name=<device[195].name>
I want to create a RegEx that will parse out each of the <device> tags, for example, I'd expect two items to be matched from my input string: <device[195].level> and <device[195].name>.
So far I've had some luck with this pattern and code, but it always finds both of the device tags as a single match:
var pattern = "<device\\[[0-9]*\\]\\.\\S*>";
Regex rgx = new Regex(pattern);
var matches = rgx.Matches(httpData);
The result is that matches will contain a single result with the value <device[195].level>&name=<device[195].name>
I'm guessing there must be a way to 'terminate' the pattern, but I'm not sure what it is.
Use non-greedy quantifiers:
<device\[\d+\]\.\S+?>
Also, use verbatim strings for escaping regexes, it makes them much more readable:
var pattern = #"<device\[\d+\]\.\S+?>";
As a side note, I guess in your case using \w instead of \S would be more in line with what you intended, but I left the \S because I can't know that.
depends how much of the structure of the angle blocks you need to match, but you can do
"\\<device.+?\\>"
I want to create a RegEx that will parse out each of the <device> tags
I'd expect two items to be matched from my input string:
1. <device[195].level>
2. <device[195].name>
This should work. Get the matched group from index 1
(<device[^>]*>)
Live demo
String literals for use in programs:
#"(<device[^>]*>)"
Change your repetition operator and use \w instead of \S
var pattern = #"<device\[[0-9]+\]\.\w+>";
String s = #"level=<device[195].level>&name=<device[195].name>";
foreach (Match m in Regex.Matches(s, #"<device\[[0-9]+\]\.\w+>"))
Console.WriteLine(m.Value);
Output
<device[195].level>
<device[195].name>
Use named match groups and create a linq entity projection. There will be two matches, thus separating the individual items:
string data = "level=<device[195].level>&name=<device[195].name>";
string pattern = #"
(?<variable>[^=]+) # get the variable name
(?:=<device\[) # static '=<device'
(?<index>[^\]]+) # device number index
(?:]\.) # static ].
(?<sub>[^>]+) # Get the sub command
(?:>&?) # Match but don't capture the > and possible &
";
// Ignore pattern whitespace is to document the pattern, does not affect processing.
var items = Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt => new
{
Variable = mt.Groups["variable"].Value,
Index = mt.Groups["index"].Value,
Sub = mt.Groups["sub"].Value
})
.ToList();
items.ForEach(itm => Console.WriteLine ("{0}:{1}:{2}", itm.Variable, itm.Index, itm.Sub));
/* Output
level:195:level
name:195:name
*/
I've inherited a code block that contains the following regex and I'm trying to understand how it's getting its results.
var pattern = #"\[(.*?)\]";
var matches = Regex.Matches(user, pattern);
if (matches.Count > 0 && matches[0].Groups.Count > 1)
...
For the input user == "Josh Smith [jsmith]":
matches.Count == 1
matches[0].Value == "[jsmith]"
... which I understand. But then:
matches[0].Groups.Count == 2
matches[0].Groups[0].Value == "[jsmith]"
matches[0].Groups[1].Value == "jsmith" <=== how?
Looking at this question from what I understand the Groups collection stores the entire match as well as the previous match. But, doesn't the regexp above match only for [open square bracket] [text] [close square bracket] so why would "jsmith" match?
Also, is it always the case the the groups collection will store exactly 2 groups: the entire match and the last match?
match.Groups[0] is always the same as match.Value, which is the entire match.
match.Groups[1] is the first capturing group in your regular expression.
Consider this example:
var pattern = #"\[(.*?)\](.*)";
var match = Regex.Match("ignored [john] John Johnson", pattern);
In this case,
match.Value is "[john] John Johnson"
match.Groups[0] is always the same as match.Value, "[john] John Johnson".
match.Groups[1] is the group of captures from the (.*?).
match.Groups[2] is the group of captures from the (.*).
match.Groups[1].Captures is yet another dimension.
Consider another example:
var pattern = #"(\[.*?\])+";
var match = Regex.Match("[john][johnny]", pattern);
Note that we are looking for one or more bracketed names in a row. You need to be able to get each name separately. Enter Captures!
match.Groups[0] is always the same as match.Value, "[john][johnny]".
match.Groups[1] is the group of captures from the (\[.*?\])+. The same as match.Value in this case.
match.Groups[1].Captures[0] is the same as match.Groups[1].Value
match.Groups[1].Captures[1] is [john]
match.Groups[1].Captures[2] is [johnny]
The ( ) acts as a capture group. So the matches array has all of matches that C# finds in your string and the sub array has the values of the capture groups inside of those matches. If you didn't want that extra level of capture jut remove the ( ).
Groups[0] is your entire input string.
Groups[1] is your group captured by parentheses (.*?). You can configure Regex to capture Explicit groups only (there is an option for that when you create a regex), or use (?:.*?) to create a non-capturing group.
The parenthesis is identifying a group as well, so match 1 is the entire match, and match 2 are the contents of what was found between the square brackets.
How? The answer is here
(.*?)
That is a subgroup of #"[(.*?)];