Capture repeating pattern after const string in C# - c#

I need any number of Versions from this string:
magic-string: [\"1.0.2.2 \", \"1.2\", \"1.1\"];
What I have:
[\s""\\]+([\d\.]+)+[\s""\\]+
Matches:
1.0.2.2
1.2
1.1
Fine so far, but I want to ensure that the "magic-string" is available as well and this will not match:
any-random-string: [\"1.0.2.2 \", \"1.2\", \"1.1\"];
EDIT:
Working solution in C#:
public class Program
{
public static void Main()
{
string pattern = #"(?<=^\s*magic-string:\s*\[(?:\s*""(?:\d+(?:\.\d+)*\s*"",)?)+)\d+(?:\.\d+)*";
var matches = Regex.Matches(" magic-string: [ \"1.0\", \"1.2\", \"1.1\" ];", pattern, RegexOptions.IgnoreCase);
Console.WriteLine(matches.Count);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups.Count);
Console.WriteLine(match.Value);
Console.WriteLine(match.Groups[1].Value);
}
}
}
https://dotnetfiddle.net/Kc2J2A

In languages that support variable-length lookbehinds (like .NET and JavaScript EMCA2018+):
See regex in use here
(?<=^magic-string:\s*\[(?:\s*\\"(?:\d+(?:\.\d+)*\s*\\",)?)+)\d+(?:\.\d+)*
How it works:
(?<=^magic-string:\s*\[(?:\s*\\"(?:\d+(?:\.\d+)*\s*\\",)?)+) positive lookbehind ensuring what precedes matches the following
^magic-string:\s*\[ match the following
^ assert position at the start of the line
magic-string: match this literally
\s*\[ match any number of whitespace characters, followed by [ literally
(?:\s*\\"(?:\d+(?:\.\d+)*\s*\\",)?)+ match the following one or more times
\s*\\", match any number of whitespace characters, followed by \", literally
(?:\d+(?:\.\d+)*\s*\\",)? optionally match the following
\d+ match any digit one or more times
(?:\.\d+)* match . then one or more digits, any number of times (matches .1, .1.1, etc. where 1 is any number)
\s*\\" match any number of whitespace characters, followed by \" literally
\d+ match any digit one or more times
(?:\.\d+)* match . then one or more digits, any number of times (matches .1, .1.1, etc. where 1 is any number)
In simple terms, this matches all locations of 0, 0.0, 0.0.0, etc. that are preceded by magic-string: [\"0.0\", \" with the substring 0.0\", \" appearing zero or more times. (0.0 being a placeholder for all the formats that (?:\d+(?:\.\d+)* matches).
You can use the following regex in languages that support \G and \K tokens (like PCRE):
See regex in use here
(?:^magic-string:\s*\[|\G(?!\A)\s*\\",)\s*\\"\K\d+(?:\.\d+)*
How it works:
(?:^magic-string:\s*\[|\G(?!\A)\s*\\",) match either of the following options
^magic-string:\s*\[ match the following
^ assert position at the start of the line
magic-string: match this literally
\s*\[ match any number of whitespace characters, followed by [ literally
\G(?!\A)\s*\\", match the following
\G(?!\A) assert position at the end of the previous match
\s*\\", match any number of whitespace characters, followed by \", literally
\s*\\"\K\d+(?:\.\d+)*
\s*\\" match any number of whitespace characters, followed by \" literally
\K reset the starting point of the match, any previously consumed characters are no longer in the final match
\d+ match any digit one or more times
(?:\.\d+)* match . then one or more digits, any number of times (matches .1, .1.1, etc. where 1 is any number)
In simple terms, this matches all locations that are preceded by magic-string: [\" or the position of a previous matched followed by \", \".

Related

Match up to the comma - Regex

I have created a Regex Pattern (?<=[TCC|TCC_BHPB]\s\d{3,4})[-_\s]\d{1,2}[,]
This Pattern match just:
TCC 6005_5,
What should I change to the end to match these both strings:
TCC 6005-5 ,
TCC 6005_5,
You can add a non-greedy wildcard to your expression (.*?):
(?<=(?:TCC|TCC_BHPB)\s\d{3,4})[-_\s]\d{1,2}.*?[,]
^^^
This will now also match any characters between the last digit and the comma.
As has been pointed out in the comments, [TCC|TCC_BHPB] is a character class rather than a literal match, so I've changed this to (?:TCC|TCC_BHPB) which is presumably what your intention was.
Try it online
This part of the pattern [TCC|TCC_BHPB] is a character class that matches one of the listed characters. It might also be written for example as [|_TCBHP]
To "match" both strings, you can match all parts instead of using a positive lookbehind.
\bTCC(?:_BHPB)?\s\d{3,4}[-_\s]\d{1,2}\s?,
See a regex demo
\bTCC A word boundary to prevent a partial match, then match TCC
(?:_BHPB)?\s\d{3,4} Optionally match _BHPB, match a whitespace char and 3-4 digits (Use [0-9] to match a digit 0-9)
[-_\s]\d{1,2} Match one of - _ or a whitespace char
\s?, Match an optional space and ,
Note that \s can also match a newline.
Using the lookbehind:
(?<=TCC(?:_BHPB)?\s\d{3,4})[-_\s]\d{1,2}\s?,
Regex demo
Or if you want to match 1 or more spaces except a newline
\bTCC(?:_BHPB)?[\p{Zs}\t][0-9]{3,4}[-_\p{Zs}\t][0-9]{1,2}[\p{Zs}\t]*,
Regex demo

How to extract middle values in regex

I am trying to extract the fifth and sixth value present in the stream through regex.
The stream is
12,097.00 435.00 100.00 43,037.00 3,090.00 200.00 86.00 45,890.47 7,570.00 51,514.47
I want values 200.00 and 100.00.
I tried ^(?:\S+\s+\n?){3,3} but it's selecting the string from beginning.
Can anybody help me please in getting the values that are present in the middle?
Using a quantifier like {3,3} can be written as {3}, but note that in the example string the values 200.00 and 100.00 are not the 5th and the 6th value.
With your pattern you only get the values at the beginning as the anchor ^ asserts the start of the string.
To get the third and the sixth value, you could also use 2 capture groups by using a quantifier {2} for the parts in between.
^(?:\S+\s+){2}(\S+)(?:\s+\S+){2}\s+(\S+)
^ Start of string
(?:\S+\s+){2} Repeat 2 times matching non whitespace chars followed by whitespace char
(\S+) Capture group 1, match 1+ non whitespace chars
(?:\s+\S+){2}\s+ Repeat 2 times matching whitespace chars and non whitespace chars
(\S+) Capture group 2, match 1+ non whitespace chars
Regex demo
Certainly, if you have access to the code itself, it would be easier to split the string and get nth chunk by its index.
If you are limited to a regex, you can use
(?<=^(?:\S+\s+){2})\S+
(?<=^(?:\S+\s+){5})\S+
Or, if there can be leading whitespaces:
(?<=^\s*(?:\S+\s+){2})\S+
(?<=^\s*(?:\S+\s+){5})\S+
See a .NET regex demo.
Details:
(?<= - start of a positive lookbehind that requires the following sequence of patterns to appear immediately to the left of the current location:
^ - start of string
\s* - zero or more whitespaces
(?:\S+\s+){2} - two occurrences of 1+ non-whitespace chars followed with 1+ whitespace chars
) - end of the lookbehind
\S+ - one or more non-whitespace chars.

How to check Equal Sign in string using REGEX in C#

I want to check string which look like following
1st radius = 120
and
2nd radius = 'value'
Here is my code
v1 = new Regex(#"^[A-Za-z]+\s[=]\s[A-Za-z]+$");
if (v1.IsMatch(singleLine))`
{
...
...
}
Using #"^[A-Za-z]+\s[=]\s[A-Za-z]+$" this expression 2nd string is matched but not first and when used this #"^[A-Za-z]+\s[=]\s\d{0,3}$" then only matched first one.
And i also want to check for radius = 'val01'
Basing on your effort, it looks as if you were trying to come up with
^[A-Za-z]+\s=\s(?:'[A-Za-z0-9]+'|\d{1,3})$
See the regex demo. Details:
^ - start of string
[A-Za-z]+ - one or more ASCII letters
\s=\s - a = char enclosed with single whitespace chars
(?:'[A-Za-z0-9]+'|\d{1,3}) - a non-capturing group matching either
'[A-Za-z0-9]+' - ', then one or more ASCII letters or digits and then a '
| - or
\d{1,3} - one, two or three digits
$ - end of string (actually, \z is safer when it comes to validating as there can be no final trailing newline after \z, and there can be such a newline after $, but it also depends on how you obtain the input).
If the pattern you tried ^[A-Za-z]+\s[=]\s[A-Za-z]+$ matches the second string radius = 'value', that means that 'value' consists of only chars A-Za-z.
In that case, you could either add matching digits to the second character class:
^[A-Za-z]+\s=\s[A-Za-z0-9]+$
If you either want to match 1-3 digits or at least a single char A-Za-z followed by optional digits:
^[A-Za-z]+\s=\s(?:[0-9]{1,3}|[A-Za-z]+[0-9]*)$
The pattern matches:
^ Start of string
[A-Za-z]+\s=\s Match the first part with chars A-Za-z and the = sign (Note that = does not have to be between square brackets)
(?: Non capture group
[0-9]{1,3} Match 1-3 digits (You can use \d{0,3} but that will also match an emtpy string due to the 0)
| Or
[A-Za-z]+[0-9]* Match 1+ chars A-Za-z followed by optional digits
) Close non capture group
$ End of string
Regex demo

Regex to get digits from a string when there is no separator between digits

I have a string like Acc:123-456-789 and another string like -1234567, I need your help to write an expression to match digits in case there is no separator between the digits.
-*(?!\d*(?:\d*-)$)\d*$
Input strings:
Acc:123-456-789 -12323232 7894596
Desired result:
group 1 12323232
group 2 7894596
I think this ought to work:
(?<=^|\s|\s-)(\d+)(?=\s|$)
Breaking it down:
(?<=^|\s|\s-) - A positive lookbehind that matches the start of the string, whitespace, or whitespace followed by a -.
(\d+) - Matches and captures number sequences.
(?=\s|$) - A positive lookahead that matches whitespace or the end of the string.
** Note: If you need to capture negative number sequences, replace (\d+) with (\-?\d+).
Try it online
Regex reference
Remember for use in C# that you need to escape backslashes or use the # prefix to a string literal (#" ").

Regex that removes the 2 trailing letters from a string not preceded with other letters

This is in C#. I've been bugging my head but not luck so far.
So for example
123456BVC --> 123456BVC (keep the same)
123456BV --> 123456 (remove trailing letters)
12345V -- > 12345V (keep the same)
12345 --> 12345 (keep the same)
ABC123AB --> ABC123 (remove trailing letters)
It can start with anything.
I've tried #".*[a-zA-Z]{2}$" but no luck
This is in C# so that I always return a string removing the two trailing letters if they do exist and are not preceded with another letter.
Match result = Regex.Match(mystring, pattern);
return result.Value;
Your #".*[a-zA-Z]{2}$" regex matches any 0+ characters other than a newline (as many as possible) and 2 ASCII letters at the end of the string. You do not check the context, so the 2 letters are matched regardless of what comes before them.
You need a regex that will match the last two letters not preceded with a letter:
(?<!\p{L})\p{L}{2}$
See this regex demo.
Details:
(?<!\p{L}) - fails the match if a letter (\p{L}) is found before the current position (you may use [a-zA-Z] if you only want to deal with ASCII letters)
\p{L}{2} - 2 letters
$ - end of string.
In C#, use
var result = Regex.Replace(mystring, #"(?<!\p{L})\p{L}{2}$", string.Empty);
If you're looking to remove those last two letters, you can simply do this:
string result = Regex.Replace(originalString, #"[A-Za-z]{2}$", string.Empty);
Remember that in regex $ means the end of the input or the string before a newline.

Categories