I'm using the following LINQ command to extract a list of bracket-delimited parameters from a string using a reluctant regular expression:
var result = Regex.Matches("foo[a][b][cccc]bar", #"(\[.+?])")
.Cast<Match>()
.Select(match => match.ToString())
.ToArray();
This returns the following string array as expected:
- result {string[3]} string[]
[0] "[a]" string
[1] "[b]" string
[2] "[cccc]" string
Is there a way to modify the regular expression itself so that the brackets aren't included in the output? I tried placing the .+ part of the expression inside a named group but it broke the matching. Obviously I could run each result through another regular expression to remove the brackets but I'd like to find out if there's a cleaner/better way to do this.
Yes, you can use look-behind and look-ahead assertion:
(?<=\[)(.+?)(?=])
The code is then:
var result = Regex.Matches("foo[a][b][cccc]bar", #"(?<=\[).+?(?=])")
.Cast<Match>()
.Select(m => m.ToString())
.ToArray();
Please also note that you don't need grouping brackets () in your regex as you're not trying to capture any groups.
Related
I'm trying to extract values from a string which are between << and >>. But they could happen multiple times.
Can anyone help with the regular expression to match these;
this is a test for <<bob>> who like <<books>>
test 2 <<frank>> likes nothing
test 3 <<what>> <<on>> <<earth>> <<this>> <<is>> <<too>> <<much>>.
I then want to foreach the GroupCollection to get all the values.
Any help greatly received.
Thanks.
Use a positive look ahead and look behind assertion to match the angle brackets, use .*? to match the shortest possible sequence of characters between those brackets. Find all values by iterating the MatchCollection returned by the Matches() method.
Regex regex = new Regex("(?<=<<).*?(?=>>)");
foreach (Match match in regex.Matches(
"this is a test for <<bob>> who like <<books>>"))
{
Console.WriteLine(match.Value);
}
LiveDemo in DotNetFiddle
While Peter's answer is a good example of using lookarounds for left and right hand context checking, I'd like to also add a LINQ (lambda) way to access matches/groups and show the use of simple numeric capturing groups that come handy when you want to extract only a part of the pattern:
using System.Linq;
using System.Collections.Generic;
using System.Text.RegularExpressions;
// ...
var results = Regex.Matches(s, #"<<(.*?)>>", RegexOptions.Singleline)
.Cast<Match>()
.Select(x => x.Groups[1].Value);
Same approach with Peter's compiled regex where the whole match value is accessed via Match.Value:
var results = regex.Matches(s).Cast<Match>().Select(x => x.Value);
Note:
<<(.*?)>> is a regex matching <<, then capturing any 0 or more chars as few as possible (due to the non-greedy *? quantifier) into Group 1 and then matching >>
RegexOptions.Singleline makes . match newline (LF) chars, too (it does not match them by default)
Cast<Match>() casts the match collection to a IEnumerable<Match> that you may further access using a lambda
Select(x => x.Groups[1].Value) only returns the Group 1 value from the current x match object
Note you may further create a list of array of obtained values by adding .ToList() or .ToArray() after Select.
In the demo C# code, string.Join(", ", results) generates a comma-separated string of the Group 1 values:
var strs = new List<string> { "this is a test for <<bob>> who like <<books>>",
"test 2 <<frank>> likes nothing",
"test 3 <<what>> <<on>> <<earth>> <<this>> <<is>> <<too>> <<much>>." };
foreach (var s in strs)
{
var results = Regex.Matches(s, #"<<(.*?)>>", RegexOptions.Singleline)
.Cast<Match>()
.Select(x => x.Groups[1].Value);
Console.WriteLine(string.Join(", ", results));
}
Output:
bob, books
frank
what, on, earth, this, is, too, much
You can try one of these:
(?<=<<)[^>]+(?=>>)
(?<=<<)\w+(?=>>)
However you will have to iterate the returned MatchCollection.
Something like this:
(<<(?<element>[^>]*)>>)*
This program might be useful:
http://sourceforge.net/projects/regulator/
I have this following string :
((1+2)*(4+3))
I would like to get the values exposed with parentheses separately through a Regex. These values must be in a array like string array.
For example :
Group 1 : ((1+2)*(4+3))
Group 2 : (1+2)
Group 3 : (4+3)
I have tried this Regex :
(?<content>\(.+\))
But she don't functional, because she keeps the group 1
You will have solutions that could allow me to manage this recursively?
You may get all overlapping substrings starting with ( and ending with ) and having any amount of balanced nested parentheses inside using
var result = Regex.Matches(s, #"(?=(\((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!)|)\)))").Cast<Match>().Select(x => x.Groups[1].Value);
See the regex demo online.
Regex details
The regex is a positive lookahead ((?=...)) that checks each position within a string and finds a match if its pattern matches. Since the pattern is enclosed with a capturing group ((...)) the value is stored in match.Groups[1] that you may retrieve once the match is found. \((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!)|)\) is a known pattern that matches nested balanced parentheses.
C# demo:
var str = "((1+2)*(4+3))";
var pattern = #"(?=(\((?>[^()]+|(?<o>)\(|(?<-o>)\))*(?(o)(?!)|)\)))";
var result = Regex.Matches(str, pattern)
.Cast<Match>()
.Select(x => x.Groups[1].Value);
Console.WriteLine(string.Join("\n", result));
Output:
((1+2)*(4+3))
(1+2)
(4+3)
I've got a Regex: [-]?\d+[-+^/\d] and this line of code:
foreach(Match m in Regex.Matches(s, "[-]?\d+[-+*^/\d]*")){
...
}
m.Value = (for example) 2+4*50
Is there any way to get a string array in the form of {"2", "+", "4", "*", "50"}?
First of all, you've got the wrong regular expression. You have a regex that matches the entire string, but you want to split the string on token boundaries, so you want to recognize tokens.
Second, don't attempt to solve the problem of whether - is a unary minus or an operator at lex time. That's a parse problem.
Third, you can use ordinary LINQ operators to turn the match collection into an array of strings.
Put it all together:
string s = "10*20+30-40/50";
var matches =
Regex.Matches(s, #"\d+|[-+*/]")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
This technique only works if your lexical grammar is regular, and many are not. (And even some languages that are technically regular are inconvenient to characterize as a regular expression.) As I noted in a comment: writing a lexer is not hard. Consider just writing a lexer rather than using regular expressions.
You can try split on word boundary
var st = "2+4*50";
var li = Regex.Split(st, #"\b");
foreach (var i in li)
Console.WriteLine(i);
2
+
4
*
50
This is very similar to the question here: How do I extract text that lies between parentheses (round brackets)? which I see this Regex code:
var matches = Regex.Matches("User name [[sales]] and [[anotherthing]]", #"\[\[([^)]*)\]\]");
But that doesn't seem to work with multi-character delimiters? This might not even be the correct way to go, but I am sure I am not the first to try this and I am drawing a blank here - anyone?
Your #"\[\[([^)]*)\]\]" pattern matches two consecutive [[, followed with zero or more characters other than a ) and then followed with two ]]. That means, if you have a ) inside [[...]], there won't be a match.
To deal with multicharacter-delimited substrings, you can use 2 things: either lazy dot matching, or unrolled patterns.
Note: to get multiple matches, use Regex.Matches as I wrote in my other answer.
1. Lazy dot solution:
var s = "User name [[sales]] and [[anotherthing]]";
var matches = Regex.Matches(s, #"\[{2}(.*?)]{2}", RegexOptions.Singleline)
.Cast<Match>()
.Select(p => p.Groups[1].Value)
.ToList();
See the regex demo. The RegexOptions.Singleline modifier is necessary for the . to match newline symbols.
2. Unrolled regex solution:
var s = "User name [[sales]] and [[anotherthing]]";
var matches = Regex.Matches(s, #"\[{2}([^]]*(?:](?!])[^]]*)*)]{2}")
.Cast<Match>()
.Select(p => p.Groups[1].Value)
.ToList();
With this one, RegexOptions.Singleline is not necessary, and it is much more efficient and faster.
See regex demo
Use Regex.Matches:
Searches the specified input string for all occurrences of a specified regular expression.
Sample code:
var matches = Regex.Matches("User name (sales) and (anotherthing)", #"\(([^)]*)\)")
.Cast<Match>()
.Select(p => p.Groups[1].Value)
.ToList();
I never use Regular expressions because they seem so complicated though I know that they are dense and powerful. I thought I would give them a shot with your help
How do I use regular expressions to extract all occurences of %sometext% in a string variable and return a string array of matching items?
For example, if the input string is:
set NewVariable=%Variable1%%Variable2%%Variable3%SomeText%Variable4%
The output array would be:
Array[0]=Variable1
Array[1]=Variable2
Array[2]=Variable3
Array[3]=Variable4
The regex should look like this:
%([^%]*)%
The delimiters are on both sides, the capturing group is i between them.
Here is how:
var mc = Regex.Matches(
"quick%brown%%fox%jumps%over%the%lazy%%dog%"
, "%([^%]*)%"
);
foreach (Match m in mc) {
Console.WriteLine(m.Groups[1]);
}
The output of the above looks like this:
brown
fox
over
lazy
dog
Here is a demo on ideone.
var NewVariable = "%Variable1%%Variable2%%Variable3%SomeText%Variable4%";
var Array = Regex.Matches(NewVariable, #"%(.+?)%")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToArray();
Your regular expression is %[^%]+% . Look at the Regex.Matches method.