RegEx for parsing repeated groups - c#

The source string contain tags like this:
>>>tagA
contents 1
<<<tagA
...
>>>tagB
contents 2
<<<tagB
...
I need to extract tag names and contents inside them. This is what I've got but still not working:
(?<=(>>>(?<tagName>.+)$))(?<contents2>.*?)(?=(<<<.+)$)
It results to two matches but the tagName in the second match captured multiple lines:
tagA
contents 1
<<<tagA
What am I doing wrong?

You may use
>>>(?<tagName>.+?)[\r\n]+(?s:(?<contents>.*?))<<<
See the regex demo
Details
>>> - a >>> substring
(?<tagName>.+?) - Group "tagName": any 1+ chars as few as possible
[\r\n]+ - one or more CR or LF symbols
(?s:(?<contents>.*?)) - Group "contents": an inline modifier group matching any 0+ chars, but as few as possible
<<< - a <<< substring.
In C#:
var matches = Regex.Matches(s, #">>>(?<tagName>.+?)[\r\n]+(?s:(?<contents>.*?))<<<");
See the C# demo:
var s = ">>>tagA\ncontents 1\n<<<tagA\n...\n>>>tagB\ncontents 2\n<<<tagB\n...";
var matches = Regex.Matches(s, #">>>(?<tagName>.+?)[\r\n]+(?s:(?<contents>.*?))<<<");
foreach (Match m in matches) {
Console.WriteLine(m.Groups["tagName"].Value);
Console.WriteLine(m.Groups["contents"].Value);
}
Output:
tagA
contents 1
tagB
contents 2

Here, we would likely start with a simple expression which is bounded with >>> and <<<, maybe something similar to:
>>>(.+)\s*(.+)\s*<<<.+
which we are having our desired data in these two capturing groups:
(.+)
and we would script the rest of our problem.
Demo
Test
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #">>>(.+)\s*(.+)\s*<<<.+";
string input = #">>>tagA
contents 1
<<<tagA
>>>tagB
contents 2
<<<tagB
>>>tagC
contents 2
<<<tagC
";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
RegEx Circuit
jex.im visualizes regular expressions:

Related

Use RegEx to extract specific part from string

I have string like
"Augustin Ralf (050288)"
"45 Max Müller (4563)"
"Hans (Adam) Meider (056754)"
I am searching for a regex to extract the last part in the brackets, for example this results for the strings above:
"050288"
"4563"
"056754"
I have tried with
var match = Regex.Match(string, #".*(\(\d*\))");
But I get also the brackets with the result. Is there a way to extract the strings and get it without the brackets?
Taking your requirements precisely, you are looking for
\(([^()]+)\)$
This will capture anything between the parentheses (not nested!), may it be digits or anything else and anchors them to the end of the string. If you happen to have whitespace at the end, use
\(([^()]+)\)\s*$
In C# this could be
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"\(([^()]+)\)$";
string input = #"Augustin Ralf (050288)
45 Max Müller (4563)
Hans (Adam) Meider (056754)
";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
See a demo on regex101.com.
please use regex - \(([^)]*)\)[^(]*$. This is working as expected. I have tested here
You can extract the number between the parantheses without worring about extracting the capturing groups with following regex.
(?<=\()\d+(?=\)$)
demo
Explanation:
(?<=\() : positive look behind for ( meaning that match will start after a ( without capturing it to the result.
\d+ : captures all digits in a row until non digit character found
(?=\)$) : positive look ahead for ) with line end meaning that match will end before a ) with line ending without capturing ) and line ending to the result.
Edit: If the number can be within parantheses that is not at the end of the line, remove $ from the regex to fix the match.
var match = Regex.Match(string, #".*\((\d*)\)");
https://regex101.com/r/Wk9asY/1
Here are three options for you.
The first one uses the simplest pattern and in addition the Trim method.
The second one uses capturing the desired value to the group and then getting it from the group.
The third one uses Lookbehind and Lookahead.
var inputs = new string[] {
"Augustin Ralf (050288)", "45 Max Müller (4563)", "Hans (Adam) Meider (056754)"
};
foreach (var input in inputs)
{
var match = Regex.Match(input, #"\(\d+\)");
Console.WriteLine(match.Value.Trim('(', ')'));
}
Console.WriteLine();
foreach (var input in inputs)
{
var match = Regex.Match(input, #"\((\d+)\)");
Console.WriteLine(match.Groups[1]);
}
Console.WriteLine();
foreach (var input in inputs)
{
var match = Regex.Match(input, #"(?<=\()\d+(?=\))");
Console.WriteLine(match.Value);
}
Console.WriteLine();

Regex matching multiple field value in a single line

i wish to match a multiple field value delimited by a colon in a single line, but each field and value text contains space
e.g.
field1 : value1a value1b
answer
match1: Group1=field1, Group2=value1a value1b
or
field1 : value1a value1b field2 : value2a value2b
answer
match1: Group1=field1, Group2=value1a value1b
match2: Group1=field2, Group2=value2a value2b
the best i can do right now is (\w+)\s*:\s*(\w+)
Regex regex = new Regex(#"(\w+)\s*:\s*(\w+)");
Match m = regex.Match("field1 : value1a value1b field2 : value2a value2b");
while (m.Success)
{
string f = m.Groups[1].Value.Trim();
string v = m.Group2[2].Value.Trim();
}
i guess look ahead may help, but i don't know how to make it
thank you
You may try
(\w+)\s*:\s*((?:(?!\s*\w+\s*:).)*)
(\w+) group 1, any consecutive words
\s*:\s* a colon with any space around
(...) group 2
(?:...)* a non capture group, repeats any times
(?!\s*\w+\s*:). negative lookahead with a character ahead, the following character must not form a word surrounds by any space followed by a colon. Thus the group 2 never consumes any words before a colon
See the test cases
You can use a regex based on a lazy dot:
var matches = Regex.Matches(text, #"(\w+)\s*:\s*(.*?)(?=\s*\w+\s*:|$)");
See the C# demo online and the .NET regex demo (please mind that regex101.com does not support .NET regex flavor).
As you see, no need using a tempered greedy token. The regex means:
(\w+) - Group 1: any one or more letters/digits/underscore
\s*:\s* - a colon enclosed with zero or more whitespace chars
(.*?) - Group 2: any zero or more chars other than a newline, as few as possible
(?=\s*\w+\s*:|$) - up to the first occurrence of one or more word chars enclosed with zero or more whitesapces or end of string.
Full C# demo:
using System;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
var text = "field1 : value1a value1b field2 : value2a value2b";
var matches = Regex.Matches(text, #"(\w+)\s*:\s*(.*?)(?=\s*\w+\s*:|$)");
foreach (Match m in matches)
{
Console.WriteLine("-- MATCH FOUND --\nKey: {0}, Value: {1}",
m.Groups[1].Value, m.Groups[2].Value);
}
}
}
Output:
-- MATCH FOUND --
Key: field1, Value: value1a value1b
-- MATCH FOUND --
Key: field2, Value: value2a value2b

Problem with brackets in regular expression in C#

can anybody help me with regular expression in C#?
I want to create a pattern for this input:
{a? ab 12 ?? cd}
This is my pattern:
([A-Fa-f0-9?]{2})+
The problem are the curly brackets. This doesn't work:
{(([A-Fa-f0-9?]{2})+)}
It just works for
{ab}
I would use {([A-Fa-f0-9?]+|[^}]+)}
It captures 1 group which:
Match a single character present in the list below [A-Fa-f0-9?]+
Match a single character not present in the list below [^}]+
If you allow leading/trailing whitespace within {...} string, the expression will look like
{(?:\s*([A-Fa-f0-9?]{2}))+\s*}
See this regex demo
If you only allow a single regular space only between the values inside {...} and no space after { and before }, you can use
{(?:([A-Fa-f0-9?]{2})(?: (?!}))?)+}
See this regex demo. Note this one is much stricter. Details:
{ - a { char
(?:\s*([A-Fa-f0-9?]{2}))+ - one or more occurrences of
\s* - zero or more whitespaces
([A-Fa-f0-9?]{2}) - Capturing group 1: two hex or ? chars
\s* - zero or more whitespaces
} - a } char.
See a C# demo:
var text = "{a? ab 12 ?? cd}";
var pattern = #"{(?:([A-Fa-f0-9?]{2})(?: (?!}))?)+}";
var result = Regex.Matches(text, pattern)
.Cast<Match>()
.Select(x => x.Groups[1].Captures.Cast<Capture>().Select(m => m.Value))
.ToList();
foreach (var list in result)
Console.WriteLine(string.Join("; ", list));
// => a?; ab; 12; ??; cd
If you want to capture pairs of chars between the curly's, you can use a single capture group:
{([A-Fa-f0-9?]{2}(?: [A-Fa-f0-9?]{2})*)}
Explanation
{ Match {
( Capture group 1
[A-Fa-f0-9?]{2} Match 2 times any of the listed characters
(?: [A-Fa-f0-9?]{2})* Optionally repeat a space and again 2 of the listed characters
) Close group 1
} Match }
Regex demo | C# demo
Example code
string pattern = #"{([A-Fa-f0-9?]{2}(?: [A-Fa-f0-9?]{2})*)}";
string input = #"{a? ab 12 ?? cd}
{ab}";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.WriteLine(m.Groups[1].Value);
}
Output
a? ab 12 ?? cd
ab

Split a string by Regex [duplicate]

This question already has answers here:
Regular expression to extract text between square brackets
(15 answers)
Closed 5 years ago.
I'm currently thinking of how to split this kind of string into regex using c#.
[01,01,01][02,03,00][03,07,00][04,06,00][05,02,00][06,04,00][07,08,00][08,05,00]
Can someone knowledgeable on regex can point me on how to achieved this goal?
sample regex pattern that don't work:
[\dd,\dd,\dd]
sample output:
[01,01,01]
[02,03,00]
[03,07,00]
[04,06,00]
[05,02,00]
[06,04,00]
[07,08,00]
[08,05,00]
This will do the job in C# (\[.+?\]), e.g.:
var s = #"[01,01,01][02,03,00][03,07,00][04,06,00][05,02,00][06,04,00][07,08,00][08,05,00]";
var reg = new Regex(#"(\[.+?\])");
var matches = reg.Matches(s);
foreach(Match m in matches)
{
Console.WriteLine($"{m.Value}");
}
EDIT This is how the expression (\[.+?\]) works
first the outter parenthesis, ( and ), means to capture whatever the inside pattern matched
then the escaped square brackets, \[ and \], is to match the [ and ] in the source string
finally the .+? means to match one or more characters, but as few times as possible, so that it won't match all the characters before the first [ and the last ]
I know you stipulated Regex, however it's worth looking at Split again, if for only for academic purposes:
Code
var input = "[01,01,01][02,03,00][03,07,00][04,06,00][05,02,00][06,04,00][07,08,00][08,05,00]";
var output = input.Split(']',StringSplitOptions.RemoveEmptyEntries)
.Select(x => x + "]") // the bracket back
.ToList();
foreach(var o in output)
Console.WriteLine(o);
Output
[01,01,01]
[02,03,00]
[03,07,00]
[04,06,00]
[05,02,00]
[06,04,00]
[07,08,00]
[08,05,00]
The Regex solution below is restricted to 3 values of only 2 digits seperated by comma. Inside the foreach loop you can access the matching value via match.Value. >> Refiddle example
Remember to include using System.Text.RegularExpressions;
var input = "[01,01,01][02,03,00][03,07,00][04,06,00][05,02,00][06,04,00][07,08,00][08,05,00]";
foreach(var match in Regex.Matches(input, #"(\[\d{2},\d{2},\d{2}\])+"))
{
// do stuff
}
Thanks all for the answer i also got it working by using this code
string pattern = #"\[\d\d,\d\d,\d\d]";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(myResult);
Debug.WriteLine(matches.Count);
foreach (Match match in matches)
Debug.WriteLine(match.Value);

How to grab specific elements out of a string

I need to be able to grab specific elements out of a string that start and end with curly brackets. If I had a string:
"asjfaieprnv{1}oiuwehern{0}oaiwefn"
How could I grab just the 1 followed by the 0.
Regex is very useful for this.
What you want to match is:
\{ # a curly bracket
# - we need to escape this with \ as it is a special character in regex
[^}] # then anything that is not a curly bracket
# - this is a 'negated character class'
+ # (at least one time)
\} # then a closing curly bracket
# - this also needs to be escaped as it is special
We can collapse this to one line:
\{[^}]+\}
Next, you can capture and extract the inner contents by surrounding the part you want to extract with parentheses to form a group:
\{([^}]+)\}
In C# you'd do:
var matches = Regex.Matches(input, #"\{([^}]+)\}");
foreach (Match match in matches)
{
var groupContents = match.Groups[1].Value;
}
Group 0 is the whole match (in this case including the { and }), group 1 the first parenthesized part, and so on.
A full example:
var input = "asjfaieprnv{1}oiuwehern{0}oaiwef";
var matches = Regex.Matches(input, #"\{([^}]+)\}");
foreach (Match match in matches)
{
var groupContents = match.Groups[1].Value;
Console.WriteLine(groupContents);
}
Outputs:
1
0
Use the Indexof method:
int openBracePos = yourstring.Indexof ("{");
int closeBracePos = yourstring.Indexof ("}");
string stringIWant = yourstring.Substring(openBracePos, yourstring.Len() - closeBracePos + 1);
That will get your first occurrence. You need to slice your string so that the first occurrence is no longer there, then repeat the above procedure to find your 2nd occurrence:
yourstring = yourstring.Substring(closeBracePos + 1);
Note: You MAY need to escape the curly braces: "{" - not sure about this; have never dealt with them in C#
This looks like a job for regular expressions
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string str = "asjfaieprnv{1}oiuwe{}hern{0}oaiwefn";
Regex regex = new Regex(#"\{(.*?)\}");
foreach( Match match in regex.Matches(str))
{
Console.WriteLine(match.Groups[1].Value);
}
}
}
}

Categories