Regular expression for a substring from a line

Regular expression for a substring from a line - c#

I have a line XX,VV,A01,A02,A03,A11,A12,A13,A14,B11,B12,B13,ZZ,DD
I need a regular expression for
If I find A01,A02,A03 or A11,A12,A13,A14 in my line, I have to replace with "AA"
If I find B11,B12,B13 I have to replace with "BB"
I have tried using
if (Regex.IsMatch(Value, "^A0[2-9]")|| Regex.IsMatch(Value, "^A1[0-5]"))
It didnt work -- so basically if i have A02,A03, A04, A05, A06, A07 or A10,A11,A12....... A15 , I have to replace with "AA"

Description
(?:((?:[AB](?=0[2-9]|1[0-5])))[0-9]{2}(?:(?=,\s*\1),|))*
Replace With: $1$1
This regular expression will do the following:
finds consecutive comma delimited runs of A02 - A15 or B02 - B15
replaces the entire run with either AA or BB
Example
Live Demo
https://regex101.com/r/gN8aP6/1
Sample text
XX,VV,A01,A02,A03,A11,A12,A13,A14,B11,B12,B13,ZZ,DD
Sample Matches
XX,VV,A01,AA,BB,ZZ,DD
Explanation
NODE EXPLANATION
----------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
[AB] any character of: 'A', 'B'
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
0 '0'
----------------------------------------------------------------------
[2-9] any character of: '2' to '9'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
1 '1'
----------------------------------------------------------------------
[0-5] any character of: '0' to '5'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
[0-9]{2} any character of: '0' to '9' (2 times)
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
(?= look ahead to see if there is:
----------------------------------------------------------------------
, ','
----------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ")
(0 or more times (matching the most
amount possible))
----------------------------------------------------------------------
\1 what was matched by capture \1
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
, ','
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
)* end of grouping
----------------------------------------------------------------------

You have to remove the ^ from your expression like A0[2-9]. Since the result is not at the beginning of your expression(^).
Online Demo
.NET Fiddle Demo
using System;
using System.Collections;
using System.Collections.Generic;
using System.Data;
using System.Diagnostics;
using System.Text.RegularExpressions;
public static class Sample1
{
public static void Main()
{
var sampleInput = "XX,VV,A01,A02,A03,A11,A12,A13,A14,B11,B12,B13,ZZ,DD";
var results = Regex.Replace(sampleInput, "A0[2-9]|A1[0-5]", "AA");
Console.WriteLine("Line: {0}", results);
}
}

Related

Regex for multiple matches .net c#

I'm workin on a regex:
Regex regex = new Regex(#"(?<=[keyb])(.*?)(?=[\/keyb])");
With this regex im gettig everything between tags [keyb] and [/keyb]
example: [keyb]hello budy[/keyb]
output: hello buddy
What about if I want to get everything between [keyb][/keyb] and also [keyb2][/keyb2] ?
example: [keyb]hello budy[/keyb] [keyb2]bye buddy[/keyb2]
output: hello buddy
bye buddy

Use
\[(keyb|keyb2)]([\w\W]*?)\[/\1]
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
\[ '['
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
keyb 'keyb'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
keyb2 'keyb2'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
] ']'
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
[\w\W]*? any character of: word characters (a-z,
A-Z, 0-9, _), non-word characters (all
but a-z, A-Z, 0-9, _) (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
\[ '['
--------------------------------------------------------------------------------
/ '/'
--------------------------------------------------------------------------------
\1 what was matched by capture \1
--------------------------------------------------------
C# code:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"\[(keyb|keyb2)]([\w\W]*?)\[/\1]";
string input = #" [keyb]hello budy[/keyb] [keyb2]bye buddy[/keyb2]";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("Match found: {0}", m.Groups[2].Value);
}
}
}
Results:
Match found: hello budy
Match found: bye buddy

var pattern=#"\[keyb.*?](.+?)\[\/keyb.*?\]";

Tokenize a string using multiple conditions

For the string below:
var str = "value0 'value 1/5' 'x ' value2";
Is there a way I can parse that string such that I get
arr[0] = "value0";
arr[1] = "value 1/5";
arr[2] = "x ";
arr[3] = "value2";
The order of values that might come with single quotes is arbitrary. Case does not matter.
I can get all values between single quotes using a regex like
"'(.*?)'"
but I need the order of those values relative other non-single-quoted values.

Use
'(?<val>.*?)'|(?<val>\S+)
See regex proof
EXPLANATION
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
' '\''
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
\S+ non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
) end of \2
C# code:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"'(?<val>.*?)'|(?<val>\S+)";
string input = #"value0 'value 1/5' 'x ' value2";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.WriteLine(m.Groups["val"].Value);
}
}
}

In C# you can reuse the same named capture group, so you could use an alternation | using the same group name for both parts.
'(?<val>[^']+)'|(?<val>\S+)
The pattern matches:
' Match a single quote
(?<val>[^']+) Capture in group val matching 1+ times any char except ' to not match an empty string
' Match a single quote
| Or
(?<val>\S+) Capture in group val matching 1+ times any non whitespace char
See a .NET regex demo or a C# demo
For example
string pattern = #"'(?<val>[^']+)'|(?<val>\S+)";
var str = "value0 'value 1/5' 'x ' value2";
foreach (Match m in Regex.Matches(str, pattern))
{
Console.WriteLine(m.Groups["val"].Value);
}
Output
value0
value 1/5
x
value2

How do you state a regular expression in C# to skip a character, replace one with another and add a new character at a specific position?

I have a C# program that takes as input a subtitle text file with contents like this:
1
00: 00: 07.966 -> 00: 00: 11.166
How's the sea?
- This is great.
2
00: 00: 12.967 -> 00: 00: 15.766
It's really pretty.
What I want to do is basically correct it, so that it will skip any spaces, replace the . character with the , character and add another hyphen to the -> string, so that it will become -->. For the previous example, the correct output would be:
1
00:00:07,966 --> 00:00:11,166
How's the sea?
- This is great.
2
00:00:12,967 --> 00:00:15,766
It's really pretty.
So far, I've thought about iterating through each line and checking if it starts and ends with a digit, like so:
if (line.StartsWith("[0-9]") && line.EndsWith("[0-9]")) {
}
I don't know how to state the regular expression to do this, though.
Please take note that my input can have spaces anywhere at the subtitle timing line, not just after the : character, so the string can end up being as worse as this:
"^ 0 0 : 0 0 : 0 7 . 9 6 6 -> 0 0 : 0 0 : 1 1 . 1 6 6 $"

It may not be a single regex that does everything, but I think that is actually an advantage and the logic is easy to follow and modify.
using var input = new StreamReader(inputPath);
using var output = new StreamWriter(outputPath);
// matches a timestamp line with a "->" and no alpha characters
var timestampRegex = new Regex(#"[^A-Za-z]*-\s*>[^A-Za-z]*");
string line;
while((line = input.ReadLine()) != null)
{
// if a timestamp line is found then it is modified
if (timestampRegex.IsMatch(line))
{
line = Regex.Replace(line, #"\s", ""); // remove all whitespace
line = line.Replace("->", " --> "); // update arrow style
}
output.WriteLine(line);
}

You can solve it with the regular expression:
(?m)(?:\G(?!\A)|^(?=\d.*\d\r?$))(\d{2}:)[ \t](?:(\d+,\d+[ \t])(-)(>[ \t]))?
The replacement will be $1$2$3$3$4.
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
(?m) set flags for this block (with ^ and $
matching start and end of line) (case-
sensitive) (with . not matching \n)
(matching whitespace and # normally)
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
\G where the last m//g left off
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\A the beginning of the string
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
^ the beginning of a "line"
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
.* any character except \n (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
\d digits (0-9)
--------------------------------------------------------------------------------
\r? '\r' (carriage return) (optional
(matching the most amount possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
a "line"
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\d{2} digits (0-9) (2 times)
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
[ \t] any character of: ' ', '\t' (tab)
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
, ','
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
[ \t] any character of: ' ', '\t' (tab)
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
( group and capture to \4:
--------------------------------------------------------------------------------
> '>'
--------------------------------------------------------------------------------
[ \t] any character of: ' ', '\t' (tab)
--------------------------------------------------------------------------------
) end of \4
--------------------------------------------------------------------------------
)? end of grouping
C# code:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"(?:\G(?!\A)|^(?=\d.*\r?\d$))(\d{2}:)[ \t](?:(\d+,\d+[ \t])(-)(>[ \t]))?";
string substitution = #"$1$2$3$3$4";
string input = #"1
00: 00: 07,966 -> 00: 00: 11,166
How's the sea?
- This is great.
2
00: 00: 12,967 -> 00: 00: 15,766
It's really pretty.";
RegexOptions options = RegexOptions.Multiline;
Regex regex = new Regex(pattern, options);
string result = regex.Replace(input, substitution);
Console.Write(result);
}
}
Results:
1
00:00:07,966 --> 00:00:11,166
How's the sea?
- This is great.
2
00:00:12,967 --> 00:00:15,766
It's really pretty.

Find the position of numbers in string

I have following strings where I need to know position of data by specific characters.
Isaiah Kinney 06/2021 111111
Darius Knox 10/2020 111-334-555
Leo Wiley 07/2020 122-333
Stone Walls 11/2020 2112333
John Stone 12/2021 xxx-xx-xxx
I need to know on which positions are number marked as bold, when I don't have numbers instead of them I have x chars.
I tried with this " (\d+(\d+|-*)\d+)( |$)" but with this I have whitespace before bold number.

Use
(?<!\S)(?:x+(?:-x+)*|\d+(?:-\d+)*)$
See proof
EXPLANATION
NODE EXPLANATION
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
\S non-whitespace (all but \n, \r, \t, \f,
and " ")
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
x+ 'x' (1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
x+ 'x' (1 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more
times (matching the most amount
possible)):
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
Code generated by regex101 seems to yield the expected details:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"(?<!\S)(?:x+(?:-x+)*|\d+(?:-\d+)*)$";
string input = #"Isaiah Kinney 06/2021 111111
Darius Knox 10/2020 111-334-555
Leo Wiley 07/2020 122-333
Stone Walls 11/2020 2112333
John Stone 12/2021 xxx-xx-xxx";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
Output:
'111111' found at index 22.
'111-334-555' found at index 49.
'122-333' found at index 79.
'2112333' found at index 107.
'xxx-xx-xxx' found at index 134.

Regular expression for validating arithmetic expression

I have an arithmetic expression
string exp = "((2+3.1)/2)*4.456";
I want to validate by using regular expression. The expression can only have integers, floating point numbers, operands and parenthesis.
How can i generate regular expression to validate please help or suggest any other way to validate that string.

Using Perl/PCRE we could verify such simple arithmetic expressions with help of a pattern structured like:
expr = pnum ( op pnum )*
pnum = num | \( expr \)
Where num and op defined as required. For example:
num = -?+\d++(?:\.\d++)?+
op = [-+*/]
Which would give us the following working expression:
(?x)^ (?&expr) $
(?(DEFINE)
(?<expr> (?&pnum) (?: (?&op) (?&pnum) )*+ )
(?<pnum> (?> (?&num) | \( (?&expr) \) ) )
(?<num> -?+\d++(?:\.\d++)?+ )
(?<op> [-+*/] )
)
But such expressions could not be used with .NET regex as it does not support (recursive) suppatern calls (?&name).
Instead .NET regex lib offers us its special feature: balancing groups.
With balancing groups we could rewrite the required recursive call used in pnum, and use a structure like this instead:
expr = pnum ( op pnum )* (?(p)(?!))
pnum = (?> (?<p> \( )* num (?<-p> \) )* )
What we've done here is to allow any number of optional opening and closing paranthesis before and after every number, counting the total number of open parentheses (?<p> \( ), subtracting closing parentheses from that number (?<-p> \) ) and at the end of the expression make sure that the number of open parentheses is 0 (?(p)(?!)).
(I believe this is equivalent to the original structure, altho I haven't made any formal proof.)
Resulting in the following .NET pattern:
(?x)
^
(?> (?<p> \( )* (?>-?\d+(?:\.\d+)?) (?<-p> \) )* )
(?>(?:
[-+*/]
(?> (?<p> \( )* (?>-?\d+(?:\.\d+)?) (?<-p> \) )* )
)*)
(?(p)(?!))
$
C# Example:
using System;
using System.Text.RegularExpressions;
namespace RegexTest
{
class Program
{
static void Main(string[] args)
{
var expressions = new string[] {
"((2+3.1)/2)*4.456",
"1",
"(2)",
"2+2",
"(1+(2+3))",
"-2*(2+-2)",
"1+(3/(2+7-(4+3)))",
"1-",
"2+2)",
"(2+2",
"(1+(2+3)",
};
var regex = new Regex(#"(?x)
^
(?> (?<p> \( )* (?>-?\d+(?:\.\d+)?) (?<-p> \) )* )
(?>(?:
[-+*/]
(?> (?<p> \( )* (?>-?\d+(?:\.\d+)?) (?<-p> \) )* )
)*)
(?(p)(?!))
$
");
foreach (var expr in expressions)
{
Console.WriteLine("Expression: " + expr);
Console.WriteLine(" Result: " + (regex.IsMatch(expr) ? "Matched" : "Failed"));
}
}
}
}
Output:
Expression: ((2+3.1)/2)*4.456
Result: Matched
Expression: 1
Result: Matched
Expression: (2)
Result: Matched
Expression: 2+2
Result: Matched
Expression: (1+(2+3))
Result: Matched
Expression: -2*(2+-2)
Result: Matched
Expression: 1+(3/(2+7-(4+3)))
Result: Matched
Expression: 1-
Result: Failed
Expression: 2+2)
Result: Failed
Expression: (2+2
Result: Failed
Expression: (1+(2+3)
Result: Failed

You could write a simple lexer in F# using fslex/fsyacc. Here is an example which is very close to your requirement: http://blogs.msdn.com/b/chrsmith/archive/2008/01/18/fslex-sample.aspx

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regular expression for a substring from a line - c#

Related

Regex for multiple matches .net c#

Tokenize a string using multiple conditions

How do you state a regular expression in C# to skip a character, replace one with another and add a new character at a specific position?

Find the position of numbers in string

Regular expression for validating arithmetic expression

Categories

Resources