c# regex matches exclude first and last character - c#

I know nothing about regex so I am asking this great community to help me out.
With the help of SO I manage to write this regex:
string input = "((isoCode=s)||(isoCode=a))&&(title=s)&&((ti=2)&&(t=2))||(t=2&&e>5)";
string pattern = #"\((?>\((?<DEPTH>)|\)(?<-DEPTH>)|.?)*(?(DEPTH)(?!))\)|&&|\|\|";
MatchCollection matches = Regex.Matches(input, pattern);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[0].Value);
}
And the result is:
((isoCode=s)||(isoCode=a))
&&
(title=s)
&&
((ti=2)&&(t=2))
||
(t=2&&e>5)
but I need result like this (without first/last "(", ")"):
(isoCode=s)||(isoCode=a)
&&
title=s
&&
(ti=2)&&(t=2)
||
t=2&&e>5
Can it be done? I know I can do it with substring (removing first and last character), but I want to know if it can be done with regex.

You may use
\((?<R>(?>\((?<DEPTH>)|\)(?<-DEPTH>)|[^()]+)*(?(DEPTH)(?!)))\)|(?<R>&&|\|\|)
See the regex demo, grab Group "R" value.
Details
\( - an open (
(?<R> - start of the R named group:
(?> - start of the atomic group:
\((?<DEPTH>)| - an open ( and an empty string is pushed on the DEPTH group stack or
\)(?<-DEPTH>)| - a closing ) and an empty string is popped off the DEPTH group stack or
[^()]+ - 1+ chars other than ( and )
)* - zero or more repetitions
(?(DEPTH)(?!)) - a conditional construct that checks if the number of close and open parentheses is balanced
) - end of R named group
\) - a closing )
| - or
(?<R>&&|\|\|) - another occurrence of Group R matching either of the 2 subpatterns:
&& - a && substring
| - or
\|\| - a || substring.
C# code:
var pattern = #"\((?<R>(?>\((?<DEPTH>)|\)(?<-DEPTH>)|[^()]+)*(?(DEPTH)(?!)))\)|(?<R>&&|\|\|)";
var results = Regex.Match(input, pattern)
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();

Brief
You can use the regex below, but I'd still strongly suggest you write a proper parser for this instead of using regex.
Code
See regex in use here
\(((?>\((?<DEPTH>)|\)(?<-DEPTH>)|.?)*(?(DEPTH)(?!)))\)|&{2}|‌​\|{2}
Usage
See regex in use here
using System;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
string input = "((isoCode=s)||(isoCode=a))&&(title=s)&&((ti=2)&&(t=2))||(t=2&&e>5)";
string pattern = #"\(((?>\((?<DEPTH>)|\)(?<-DEPTH>)|.?)*(?(DEPTH)(?!)))\)|&{2}|\|{2}";
MatchCollection matches = Regex.Matches(input, pattern);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[1].Success ? match.Groups[1].Value : match.Groups[0].Value);
}
}
}
Result
(isoCode=s)||(isoCode=a)
&&
title=s
&&
(ti=2)&&(t=2)
||
t=2&&e>5

Related

Problem with brackets in regular expression in C#

can anybody help me with regular expression in C#?
I want to create a pattern for this input:
{a? ab 12 ?? cd}
This is my pattern:
([A-Fa-f0-9?]{2})+
The problem are the curly brackets. This doesn't work:
{(([A-Fa-f0-9?]{2})+)}
It just works for
{ab}
I would use {([A-Fa-f0-9?]+|[^}]+)}
It captures 1 group which:
Match a single character present in the list below [A-Fa-f0-9?]+
Match a single character not present in the list below [^}]+
If you allow leading/trailing whitespace within {...} string, the expression will look like
{(?:\s*([A-Fa-f0-9?]{2}))+\s*}
See this regex demo
If you only allow a single regular space only between the values inside {...} and no space after { and before }, you can use
{(?:([A-Fa-f0-9?]{2})(?: (?!}))?)+}
See this regex demo. Note this one is much stricter. Details:
{ - a { char
(?:\s*([A-Fa-f0-9?]{2}))+ - one or more occurrences of
\s* - zero or more whitespaces
([A-Fa-f0-9?]{2}) - Capturing group 1: two hex or ? chars
\s* - zero or more whitespaces
} - a } char.
See a C# demo:
var text = "{a? ab 12 ?? cd}";
var pattern = #"{(?:([A-Fa-f0-9?]{2})(?: (?!}))?)+}";
var result = Regex.Matches(text, pattern)
.Cast<Match>()
.Select(x => x.Groups[1].Captures.Cast<Capture>().Select(m => m.Value))
.ToList();
foreach (var list in result)
Console.WriteLine(string.Join("; ", list));
// => a?; ab; 12; ??; cd
If you want to capture pairs of chars between the curly's, you can use a single capture group:
{([A-Fa-f0-9?]{2}(?: [A-Fa-f0-9?]{2})*)}
Explanation
{ Match {
( Capture group 1
[A-Fa-f0-9?]{2} Match 2 times any of the listed characters
(?: [A-Fa-f0-9?]{2})* Optionally repeat a space and again 2 of the listed characters
) Close group 1
} Match }
Regex demo | C# demo
Example code
string pattern = #"{([A-Fa-f0-9?]{2}(?: [A-Fa-f0-9?]{2})*)}";
string input = #"{a? ab 12 ?? cd}
{ab}";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.WriteLine(m.Groups[1].Value);
}
Output
a? ab 12 ?? cd
ab

Regex recursive substitutions

I have 3 case of data:
{{test_data}}
{{!test_data}}
{{test_data1&&!test_data2}} // test_data2 might not have the !
and I need to translate those strings with:
mystring.test_data
!mystring.test_data
mystring.test_data1 && !mystring.test_data2
I'm fiddling around with the super-useful regex101.com and i managed to cover almost all of 3 cases with Regex.Replace(str, "{{2}(?:(!?)(\w*)(\|{2}|&{2})?)}{2}", "$1mystring.$2 $3");
I can't figure out how to use regex recursion to re-apply the (?: ) part until the }} and join together all the matches using the specified substitution pattern
Is that even possible??
edit: here's the regex101 page -> https://regex101.com/r/vIBVkQ/2
I would advise to use a more generic solution here, with smaller, easier to read and maintain regexps here: one (the longest) will be used to find the substrings you need (the longest one), then a simple \w+ pattern will be used to add the my_string. part and the other will add spaces around logical operators. The smaller regexps will be used inside a match evaluator, to manipulate the values found by the longest regex:
Regex.Replace(input, #"{{!?\w+(?:\s*(?:&&|\|\|)\s*!?\w+)*}}", m =>
Regex.Replace(
Regex.Replace(m.Value, #"\s*(&&|\|\|)\s*", " $1 "),
#"\w+",
"mystring.$&"
)
)
See the C# demo
The main regex matches:
{{ - a {{ substring
!? - an optional ! sign
\w+ - 1 or more word chars
(?:\s*(?:&&|\|\|)\s*!?\w+)* - 0+ sequences of:
\s* - 0+ whitespace chars
(?:&&|\|\|) - a && or || substring
\s* - 0+ whitespaces
!? - an optional !
\w+ - 1 or more word chars
}} - a }} substring.
Regex: (?:{{2}|[^|]{2}|[^&]{2})\!?(\w+)(?:}{2})?
Regex demo
C# code:
List<string> list = new List<string>() { "{{test_data}}", "{{!test_data}}", "{{test_data1&&!test_data2}}" };
foreach(string s in list)
{
string t = Regex.Replace(s, #"(?:{{2}|[^|]{2}|[^&]{2})\!?(\w+)(?:}{2})?",
o => o.Value.Contains("!") ? "!mystring." + o.Groups[1].Value : "mystring." + o.Groups[1].Value);
Console.WriteLine(t);
}
Console.ReadLine();
Output:
mystring.test_data
!mystring.test_data
mystring.test_data1&&!mystring.test_data2
I don't think you can use recursion, but with a different representation of your input pattern, you can use sub-groups. Note I used named captures to slightly limit the confusion in this example:
var test = #"{{test_data}}
{{!test_data}}
{{test_data1&&!test_data2&&test_data3}}
{{test_data1&&!test_data2 fail test_data3}}
{{test_data1&&test_data2||!test_data3}}";
// (1:!)(2:word)(3:||&&)(4:repeat)
var matches = Regex.Matches(test, #"\{{2}(?:(?<exc>!?)(?<word>\w+))(?:(?<op>\|{2}|&{2})(?<exc2>!?)(?<word2>\w+))*}{2}");
foreach (Match match in matches)
{
Console.WriteLine("Match: {0}", match.Value);
Console.WriteLine(" exc: {0}", match.Groups["exc"].Value);
Console.WriteLine(" word: {0}", match.Groups["word"].Value);
for (int i = 0; i < match.Groups["op"].Captures.Count; i++)
{
Console.WriteLine(" op: {0}", match.Groups["op"].Captures[i].Value);
Console.WriteLine(" exc2: {0}", match.Groups["exc2"].Captures[i].Value);
Console.WriteLine("word2: {0}", match.Groups["word2"].Captures[i].Value);
}
}
The idea is to read the first word in each group unconditionally and then possibly read N combinations of (|| or &&)(optional !)(word) as separate groups with sub-captures.
Example output:
Match: {{test_data}}
exc:
word: test_data
Match: {{!test_data}}
exc: !
word: test_data
Match: {{test_data1&&!test_data2&&test_data3}}
exc:
word: test_data1
op: &&
exc2: !
word2: test_data2
op: &&
exc2:
word2: test_data3
Match: {{test_data1&&test_data2||!test_data3}}
exc:
word: test_data1
op: &&
exc2:
word2: test_data2
op: ||
exc2: !
word2: test_data3
Note the line {{test_data1&&!test_data2 fail test_data3}} is not part of the result groups because it doesn't comply with the syntax rules.
So you can build your desired result the same way from the matches structure:
foreach (Match match in matches)
{
var sb = new StringBuilder();
sb.Append(match.Groups["exc"].Value).Append("mystring.").Append(match.Groups["word"].Value);
for (int i = 0; i < match.Groups["op"].Captures.Count; i++)
{
sb.Append(' ').Append(match.Groups["op"].Captures[i].Value).Append(' ');
sb.Append(match.Groups["exc2"].Value).Append("mystring.").Append(match.Groups["word2"].Value);
}
Console.WriteLine("Result: {0}", sb.ToString());
}

C# regex. Everything inside curly brackets{} and mod(%) charaters

I'm trying to get the values between {} and %% in a same Regex.
This is what I have till now. I can successfully get values individually for each but I was curious to learn about how can I combine both.
var regex = new Regex(#"%(.*?)%|\{([^}]*)\}");
String s = "This is a {test} %String%. %Stack% {Overflow}";
Expected answer for the above string
test
String
Stack
Overflow
Individual regex
#"%(.*?)%" gives me String and Stack
#"\{([^}]*)\}" gives me test and Overflow
Following is my code.
var regex = new Regex(#"%(.*?)%|\{([^}]*)\}");
var matches = regex.Matches(s);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[1].Value);
}
Similar to your regex. You can use Named Capturing Groups
String s = "This is a {test} %String%. %Stack% {Overflow}";
var list = Regex.Matches(s, #"\{(?<name>.+?)\}|%(?<name>.+?)%")
.Cast<Match>()
.Select(m => m.Groups["name"].Value)
.ToList();
If you want to learn how conditional expressions work, here is a solution using that kind of .NET regex capability:
(?:(?<p>%)|(?<b>{))(?<v>.*?)(?(p)%|})
See the regex demo
Here is how it works:
(?:(?<p>%)|(?<b>{)) - match and capture either Group "p" with % (percentage), or Group "b" (brace) with {
(?<v>.*?) - match and capture into Group "v" (value) any character (even a newline since I will be using RegexOptions.Singleline) zero or more times, but as few as possible (lazy matching with *? quantifier)
(?(p)%|}) - a conditional expression meaning: if "p" group was matched, match %, else, match }.
C# demo:
var s = "This is a {test} %String%. %Stack% {Overflow}";
var regex = "(?:(?<p>%)|(?<b>{))(?<v>.*?)(?(p)%|})";
var matches = Regex.Matches(s, regex, RegexOptions.Singleline);
// var matches_list = Regex.Matches(s, regex, RegexOptions.Singleline)
// .Cast<Match>()
// .Select(p => p.Groups["v"].Value)
// .ToList();
// Or just a demo writeline
foreach (Match match in matches)
Console.WriteLine(match.Groups["v"].Value);
Sometimes the capture is in group 1 and sometimes it's in group 2 because you have two pairs of parentheses.
Your original code will work if you do this instead:
Console.WriteLine(match.Groups[1].Value + match.Groups[2].Value);
because one group will be the empty string and the other will be the value you're interested in.
#"[\{|%](.*?)[\}|%]"
The idea being:
{ or %
anything
} or %
I think you should use a combination of conditional anda nested groups:
((\{(.*)\})|(%(.*)%))

How to grab specific elements out of a string

I need to be able to grab specific elements out of a string that start and end with curly brackets. If I had a string:
"asjfaieprnv{1}oiuwehern{0}oaiwefn"
How could I grab just the 1 followed by the 0.
Regex is very useful for this.
What you want to match is:
\{ # a curly bracket
# - we need to escape this with \ as it is a special character in regex
[^}] # then anything that is not a curly bracket
# - this is a 'negated character class'
+ # (at least one time)
\} # then a closing curly bracket
# - this also needs to be escaped as it is special
We can collapse this to one line:
\{[^}]+\}
Next, you can capture and extract the inner contents by surrounding the part you want to extract with parentheses to form a group:
\{([^}]+)\}
In C# you'd do:
var matches = Regex.Matches(input, #"\{([^}]+)\}");
foreach (Match match in matches)
{
var groupContents = match.Groups[1].Value;
}
Group 0 is the whole match (in this case including the { and }), group 1 the first parenthesized part, and so on.
A full example:
var input = "asjfaieprnv{1}oiuwehern{0}oaiwef";
var matches = Regex.Matches(input, #"\{([^}]+)\}");
foreach (Match match in matches)
{
var groupContents = match.Groups[1].Value;
Console.WriteLine(groupContents);
}
Outputs:
1
0
Use the Indexof method:
int openBracePos = yourstring.Indexof ("{");
int closeBracePos = yourstring.Indexof ("}");
string stringIWant = yourstring.Substring(openBracePos, yourstring.Len() - closeBracePos + 1);
That will get your first occurrence. You need to slice your string so that the first occurrence is no longer there, then repeat the above procedure to find your 2nd occurrence:
yourstring = yourstring.Substring(closeBracePos + 1);
Note: You MAY need to escape the curly braces: "{" - not sure about this; have never dealt with them in C#
This looks like a job for regular expressions
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string str = "asjfaieprnv{1}oiuwe{}hern{0}oaiwefn";
Regex regex = new Regex(#"\{(.*?)\}");
foreach( Match match in regex.Matches(str))
{
Console.WriteLine(match.Groups[1].Value);
}
}
}
}

How to parse a comma delimited string when comma and parenthesis exists in field

I have this string in C#
adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO
I want to use a RegEx to parse it to get the following:
adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO
In addition to the above example, I tested with the following, but am still unable to parse it correctly.
"%exc.uns: 8 hours let # = ABC, DEF", "exc_it = 1 day" , " summ=graffe ", " a,b,(c,d)"
The new text will be in one string
string mystr = #"""%exc.uns: 8 hours let # = ABC, DEF"", ""exc_it = 1 day"" , "" summ=graffe "", "" a,b,(c,d)""";
string str = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
var resultStrings = new List<string>();
int? firstIndex = null;
int scopeLevel = 0;
for (int i = 0; i < str.Length; i++)
{
if (str[i] == ',' && scopeLevel == 0)
{
resultStrings.Add(str.Substring(firstIndex.GetValueOrDefault(), i - firstIndex.GetValueOrDefault()));
firstIndex = i + 1;
}
else if (str[i] == '(') scopeLevel++;
else if (str[i] == ')') scopeLevel--;
}
resultStrings.Add(str.Substring(firstIndex.GetValueOrDefault()));
Event faster:
([^,]*\x28[^\x29]*\x29|[^,]+)
That should do the trick. Basically, look for either a "function thumbprint" or anything without a comma.
adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO
^ ^ ^ ^ ^
The Carets symbolize where the grouping stops.
Just this regex:
[^,()]+(\([^()]*\))?
A test example:
var s= "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
Regex regex = new Regex(#"[^,()]+(\([^()]*\))?");
var matches = regex.Matches(s)
.Cast<Match>()
.Select(m => m.Value);
returns
adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO
If you simply must use Regex, then you can split the string on the following:
, # match a comma
(?= # that is followed by
(?: # either
[^\(\)]* # no parens at all
| # or
(?: #
[^\(\)]* # ...
\( # (
[^\(\)]* # stuff in parens
\) # )
[^\(\)]* # ...
)+ # any number of times
)$ # until the end of the string
)
It breaks your input into the following:
adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO
You can also use .NET's balanced grouping constructs to create a version that works with nested parens, but you're probably just as well off with one of the non-Regex solutions.
Another way to implement what Snowbear was doing:
public static string[] SplitNest(this string s, char src, string nest, string trg)
{
int scope = 0;
if (trg == null || nest == null) return null;
if (trg.Length == 0 || nest.Length < 2) return null;
if (trg.IndexOf(src) >= 0) return null;
if (nest.IndexOf(src) >= 0) return null;
for (int i = 0; i < s.Length; i++)
{
if (s[i] == src && scope == 0)
{
s = s.Remove(i, 1).Insert(i, trg);
}
else if (s[i] == nest[0]) scope++;
else if (s[i] == nest[1]) scope--;
}
return s.Split(trg);
}
The idea is to replace any non-nested delimiter with another delimiter that you can then use with an ordinary string.Split(). You can also choose what type of bracket to use - (), <>, [], or even something weird like \/, ][, or `'. For your purposes you would use
string str = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
string[] result = str.SplitNest(',',"()","~");
The function would first turn your string into
adj_con(CL2,1,3,0)~adj_cont(CL1,1,3,0)~NG~ NG/CL~ 5 value of CL(JK)~ HO
then split on the ~, ignoring the nested commas.
Assuming non nested, matching parentheses, you can easily match the tokens you want instead of splitting the string:
MatchCollection matches = Regex.Matches(data, #"(?:[^(),]|\([^)]*\))+");
var s = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
var result = string.Join(#"\n",Regex.Split(s, #"(?<=\)),|,\s"));
The pattern matches for ) and excludes it from the match then matches ,
or
matches , followed by a space.
result =
adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO
The TextFieldParser (msdn) class seems to have the functionality built-in:
TextFieldParser Class: - Provides methods and properties for parsing structured text files.
Parsing a text file with the TextFieldParser is similar to iterating over a text file, while the ReadFields method to extract fields of text is similar to splitting the strings.
The TextFieldParser can parse two types of files: delimited or fixed-width. Some properties, such as Delimiters and HasFieldsEnclosedInQuotes are meaningful only when working with delimited files, while the FieldWidths property is meaningful only when working with fixed-width files.
See the article which helped me find that
Here's a stronger option, which parses the whole text, including nested parentheses:
string pattern = #"
\A
(?>
(?<Token>
(?:
[^,()] # Regular character
|
(?<Paren> \( ) # Opening paren - push to stack
|
(?<-Paren> \) ) # Closing paren - pop
|
(?(Paren),) # If inside parentheses, match comma.
)*?
)
(?(Paren)(?!)) # If we are not inside parentheses,
(?:,|\Z) # match a comma or the end
)*? # lazy just to avoid an extra empty match at the end,
# though it removes a last empty token.
\Z
";
Match match = Regex.Match(data, pattern, RegexOptions.IgnorePatternWhitespace);
You can get all matches by iterating over match.Groups["Token"].Captures.

Categories