Regex for multiple matches .net c# - c#

I'm workin on a regex:
Regex regex = new Regex(#"(?<=[keyb])(.*?)(?=[\/keyb])");
With this regex im gettig everything between tags [keyb] and [/keyb]
example: [keyb]hello budy[/keyb]
output: hello buddy
What about if I want to get everything between [keyb][/keyb] and also [keyb2][/keyb2] ?
example: [keyb]hello budy[/keyb] [keyb2]bye buddy[/keyb2]
output: hello buddy
bye buddy

Use
\[(keyb|keyb2)]([\w\W]*?)\[/\1]
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
\[ '['
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
keyb 'keyb'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
keyb2 'keyb2'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
] ']'
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
[\w\W]*? any character of: word characters (a-z,
A-Z, 0-9, _), non-word characters (all
but a-z, A-Z, 0-9, _) (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
\[ '['
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
\1 what was matched by capture \1
--------------------------------------------------------
C# code:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"\[(keyb|keyb2)]([\w\W]*?)\[/\1]";
string input = #" [keyb]hello budy[/keyb] [keyb2]bye buddy[/keyb2]";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("Match found: {0}", m.Groups[2].Value);
}
}
}
Results:
Match found: hello budy
Match found: bye buddy

var pattern=#"\[keyb.*?](.+?)\[\/keyb.*?\]";

Related

Tokenize a string using multiple conditions

For the string below:
var str = "value0 'value 1/5' 'x ' value2";
Is there a way I can parse that string such that I get
arr[0] = "value0";
arr[1] = "value 1/5";
arr[2] = "x ";
arr[3] = "value2";
The order of values that might come with single quotes is arbitrary. Case does not matter.
I can get all values between single quotes using a regex like
"'(.*?)'"
but I need the order of those values relative other non-single-quoted values.
Use
'(?<val>.*?)'|(?<val>\S+)
See regex proof
EXPLANATION
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
\S+ non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of \2
C# code:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"'(?<val>.*?)'|(?<val>\S+)";
string input = #"value0 'value 1/5' 'x ' value2";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.WriteLine(m.Groups["val"].Value);
}
}
}
In C# you can reuse the same named capture group, so you could use an alternation | using the same group name for both parts.
'(?<val>[^']+)'|(?<val>\S+)
The pattern matches:
' Match a single quote
(?<val>[^']+) Capture in group val matching 1+ times any char except ' to not match an empty string
' Match a single quote
| Or
(?<val>\S+) Capture in group val matching 1+ times any non whitespace char
See a .NET regex demo or a C# demo
For example
string pattern = #"'(?<val>[^']+)'|(?<val>\S+)";
var str = "value0 'value 1/5' 'x ' value2";
foreach (Match m in Regex.Matches(str, pattern))
{
Console.WriteLine(m.Groups["val"].Value);
}
Output
value0
value 1/5
x
value2

How do you state a regular expression in C# to skip a character, replace one with another and add a new character at a specific position?

I have a C# program that takes as input a subtitle text file with contents like this:
1
00: 00: 07.966 -> 00: 00: 11.166
How's the sea?
- This is great.
2
00: 00: 12.967 -> 00: 00: 15.766
It's really pretty.
What I want to do is basically correct it, so that it will skip any spaces, replace the . character with the , character and add another hyphen to the -> string, so that it will become -->. For the previous example, the correct output would be:
1
00:00:07,966 --> 00:00:11,166
How's the sea?
- This is great.
2
00:00:12,967 --> 00:00:15,766
It's really pretty.
So far, I've thought about iterating through each line and checking if it starts and ends with a digit, like so:
if (line.StartsWith("[0-9]") && line.EndsWith("[0-9]")) {
}
I don't know how to state the regular expression to do this, though.
Please take note that my input can have spaces anywhere at the subtitle timing line, not just after the : character, so the string can end up being as worse as this:
"^ 0 0 : 0 0 : 0 7 . 9 6 6 -> 0 0 : 0 0 : 1 1 . 1 6 6 $"
It may not be a single regex that does everything, but I think that is actually an advantage and the logic is easy to follow and modify.
using var input = new StreamReader(inputPath);
using var output = new StreamWriter(outputPath);
// matches a timestamp line with a "->" and no alpha characters
var timestampRegex = new Regex(#"[^A-Za-z]*-\s*>[^A-Za-z]*");
string line;
while((line = input.ReadLine()) != null)
{
// if a timestamp line is found then it is modified
if (timestampRegex.IsMatch(line))
{
line = Regex.Replace(line, #"\s", ""); // remove all whitespace
line = line.Replace("->", " --> "); // update arrow style
}
output.WriteLine(line);
}
You can solve it with the regular expression:
(?m)(?:\G(?!\A)|^(?=\d.*\d\r?$))(\d{2}:)[ \t](?:(\d+,\d+[ \t])(-)(>[ \t]))?
The replacement will be $1$2$3$3$4.
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
(?m) set flags for this block (with ^ and $
matching start and end of line) (case-
sensitive) (with . not matching \n)
(matching whitespace and # normally)
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
\G where the last m//g left off
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\A the beginning of the string
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
^ the beginning of a "line"
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
\r? '\r' (carriage return) (optional
(matching the most amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
a "line"
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\d{2} digits (0-9) (2 times)
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
[ \t] any character of: ' ', '\t' (tab)
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
, ','
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
[ \t] any character of: ' ', '\t' (tab)
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
( group and capture to \4:
--------------------------------------------------------------------------------
> '>'
--------------------------------------------------------------------------------
[ \t] any character of: ' ', '\t' (tab)
--------------------------------------------------------------------------------
) end of \4
--------------------------------------------------------------------------------
)? end of grouping
C# code:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"(?:\G(?!\A)|^(?=\d.*\r?\d$))(\d{2}:)[ \t](?:(\d+,\d+[ \t])(-)(>[ \t]))?";
string substitution = #"$1$2$3$3$4";
string input = #"1
00: 00: 07,966 -> 00: 00: 11,166
How's the sea?
- This is great.
2
00: 00: 12,967 -> 00: 00: 15,766
It's really pretty.";
RegexOptions options = RegexOptions.Multiline;
Regex regex = new Regex(pattern, options);
string result = regex.Replace(input, substitution);
Console.Write(result);
}
}
Results:
1
00:00:07,966 --> 00:00:11,166
How's the sea?
- This is great.
2
00:00:12,967 --> 00:00:15,766
It's really pretty.

Find the position of numbers in string

I have following strings where I need to know position of data by specific characters.
Isaiah Kinney 06/2021 111111
Darius Knox 10/2020 111-334-555
Leo Wiley 07/2020 122-333
Stone Walls 11/2020 2112333
John Stone 12/2021 xxx-xx-xxx
I need to know on which positions are number marked as bold, when I don't have numbers instead of them I have x chars.
I tried with this " (\d+(\d+|-*)\d+)( |$)" but with this I have whitespace before bold number.
Use
(?<!\S)(?:x+(?:-x+)*|\d+(?:-\d+)*)$
See proof
EXPLANATION
NODE EXPLANATION
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
\S non-whitespace (all but \n, \r, \t, \f,
and " ")
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
x+ 'x' (1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
x+ 'x' (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
Code generated by regex101 seems to yield the expected details:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"(?<!\S)(?:x+(?:-x+)*|\d+(?:-\d+)*)$";
string input = #"Isaiah Kinney 06/2021 111111
Darius Knox 10/2020 111-334-555
Leo Wiley 07/2020 122-333
Stone Walls 11/2020 2112333
John Stone 12/2021 xxx-xx-xxx";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
Output:
'111111' found at index 22.
'111-334-555' found at index 49.
'122-333' found at index 79.
'2112333' found at index 107.
'xxx-xx-xxx' found at index 134.

Regular expression for a substring from a line

I have a line XX,VV,A01,A02,A03,A11,A12,A13,A14,B11,B12,B13,ZZ,DD
I need a regular expression for
If I find A01,A02,A03 or A11,A12,A13,A14 in my line, I have to replace with "AA"
If I find B11,B12,B13 I have to replace with "BB"
I have tried using
if (Regex.IsMatch(Value, "^A0[2-9]")|| Regex.IsMatch(Value, "^A1[0-5]"))
It didnt work -- so basically if i have A02,A03, A04, A05, A06, A07 or A10,A11,A12....... A15 , I have to replace with "AA"
Description
(?:((?:[AB](?=0[2-9]|1[0-5])))[0-9]{2}(?:(?=,\s*\1),|))*
Replace With: $1$1
This regular expression will do the following:
finds consecutive comma delimited runs of A02 - A15 or B02 - B15
replaces the entire run with either AA or BB
Example
Live Demo
https://regex101.com/r/gN8aP6/1
Sample text
XX,VV,A01,A02,A03,A11,A12,A13,A14,B11,B12,B13,ZZ,DD
Sample Matches
XX,VV,A01,AA,BB,ZZ,DD
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
[AB] any character of: 'A', 'B'
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
0 '0'
----------------------------------------------------------------------
[2-9] any character of: '2' to '9'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
1 '1'
----------------------------------------------------------------------
[0-5] any character of: '0' to '5'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
[0-9]{2} any character of: '0' to '9' (2 times)
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
, ','
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ")
(0 or more times (matching the most
amount possible))
----------------------------------------------------------------------
\1 what was matched by capture \1
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
, ','
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------
You have to remove the ^ from your expression like A0[2-9]. Since the result is not at the beginning of your expression(^).
Online Demo
.NET Fiddle Demo
using System;
using System.Collections;
using System.Collections.Generic;
using System.Data;
using System.Diagnostics;
using System.Text.RegularExpressions;
public static class Sample1
{
public static void Main()
{
var sampleInput = "XX,VV,A01,A02,A03,A11,A12,A13,A14,B11,B12,B13,ZZ,DD";
var results = Regex.Replace(sampleInput, "A0[2-9]|A1[0-5]", "AA");
Console.WriteLine("Line: {0}", results);
}
}

Split string on whitespace ignoring parenthesis

I have a string such as this
(ed) (Karlsruhe Univ. (TH) (Germany, F.R.))
I need to split it into two such as this
ed
Karlsruhe Univ. (TH) (Germany, F.R.)
Basically, ignoring whitespace and parenthesis within a parenthesis
Is it possible to use a regex to achieve this?
If you can have more parentheses, it's better to use balancing groups:
string text = "(ed) (Karlsruhe Univ. (TH) (Germany, F.R.))";
var charSetOccurences = new Regex(#"\(((?:[^()]|(?<o>\()|(?<-o>\)))+(?(o)(?!)))\)");
var charSetMatches = charSetOccurences.Matches(text);
foreach (Match match in charSetMatches)
{
Console.WriteLine(match.Groups[1].Value);
}
ideone demo
Breakdown:
\(( # First '(' and begin capture
(?:
[^()] # Match all non-parens
|
(?<o> \( ) # Match '(', and capture into 'o'
|
(?<-o> \) ) # Match ')', and delete the 'o' capture
)+
(?(o)(?!)) # Fails if 'o' stack isn't empty
)\) # Close capture and last opening brace
\((.*?)\)\s*\((.*)\)
you will get the two values in two match groups \1 and \2
demo here : http://regex101.com/r/rP5kG2
and this is what you get if you search and replace with the pattern \1\n\2 which also seems to be what you need exactly
string str = "(ed) (Karlsruhe Univ. (TH) (Germany, F.R.))";
Regex re = new Regex(#"\((.*?)\)\s*\((.*)\)");
Match match = re.Match(str);
In general, No.
You can't describe recursive patterns in regular expression. ( Since it's not possible to recognize it with a finite automaton. )

Categories