Match only the nth occurrence using a regular expression

Match only the nth occurrence using a regular expression - c#

I have a string with 3 dates in it like this:
XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx
I want to select the 2nd date in the string, the 20180208 one.
Is there away to do this purely in the regex, with have to resort to pulling out the 2 match in code. I'm using C# if that matters.
Thanks for any help.

You could use
^(?:[^_]+_){2}(\d+)
And take the first group, see a demo on regex101.com.
Broken down, this says
^ # start of the string
(?:[^_]+_){2} # not _ + _, twice
(\d+) # capture digits
C# demo:
var pattern = #"^(?:[^_]+_){2}(\d+)";
var text = "XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx";
var result = Regex.Match(text, pattern)?.Groups[1].Value;
Console.WriteLine(result); // => 20180208

Try this one
MatchCollection matches = Regex.Matches(sInputLine, #"\d{8}");
string sSecond = matches[1].ToString();

You could use the regular expression
^(?:.*?\d{8}_){1}.*?(\d{8})
to save the 2nd date to capture group 1.
Demo
Naturally, for n > 2, replace {1} with {n-1} to obtain the nth date. To obtain the 1st date use
^(?:.*?\d{8}_){0}.*?(\d{8})
Demo
The C#'s regex engine performs the following operations.
^ # match the beginning of a line
(?: # begin a non-capture group
.*? # match 0+ chars lazily
\d{8} # match 8 digits
_ # match '_'
) # end non-capture group
{n} # execute non-capture group n (n >= 0) times
.*? # match 0+ chars lazily
(\d{8}) # match 8 digits in capture group 1
The important thing to note is that the first instance of .*?, followed by \d{8}, because it is lazy, will gobble up as many characters as it can until the next 8 characters are digits (and are not preceded or followed by a digit. For example, in the string
_1234abcd_efghi_123456789_12345678_ABC
capture group 1 in (.*?)_\d{8}_ will contain "_1234abcd_efghi_123456789".

You can use System.Text.RegularExpressions.Regex
See the following example
Regex regex = new Regex(#"^(?:[^_]+_){2}(\d+)"); //Expression from Jan's answer just showing how to use C# to achieve your goal
GroupCollection groups = regex.Match("XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx").Groups;
if (groups.Count > 1)
{
Console.WriteLine(groups[1].Value);
}

Related

Regex to match words between underscores after second occurence of underscore

so i would like to get words between underscores after second occurence of underscore
this is my string
ABC_BC_BE08_C1000004_0124
I've assembled this expresion
(?<=_)[^_]+
well it matches what i need but only skips the first word since there is no underscore before it. I would like it to skip ABC and BC and just get the last three strings, i've tried messing around but i am stuck and cant make it work. Thanks!

You can use a non-regex approach here with Split and Skip:
var text = "ABC_BC_BE08_C1000004_0124";
var result = text.Split('_').Skip(2);
foreach (var s in result)
Console.WriteLine(s);
Output:
BE08
C1000004
0124
See the C# demo.
With regex, you can use
var result = Regex.Matches(text, #"(?<=^(?:[^_]*_){2,})[^_]+").Cast<Match>().Select(x => x.Value);
See the regex demo and the C# demo. The regex matches
(?<=^(?:[^_]*_){2,}) - a positive lookbehind that matches a location that matches the following patterns immediately to the left of the current location:
^ - start of string
(?:[^_]*_){2,} - two or more ({2,}) sequences of any zero or more chars other than _ ([^_]*) and then a _ char
[^_]+ - one or more chars other than _

Usign .NET there is also a captures collection that you might use with a regex and a repeated catpure group.
^[^_]*_[^_]*(?:_([^_]+))+
The pattern matches:
^ Start of string
[^_]*_[^_]* Match any char except an _, match _ and again any char except _
(?: Non capture group
_([^_]+) Match _ and capture 1 or more times any char except _ in group 1
)+ Close the non capture group and repeat 1 or more times
.NET regex demo | C# demo
For example:
var pattern = #"^[^_]*_[^_]*(?:_([^_]+))+";
var str = "ABC_BC_BE08_C1000004_0124";
var strings = Regex.Match(str, pattern).Groups[1].Captures.Select(c => c.Value);
foreach (String s in strings)
{
Console.WriteLine(s);
}
Output
BE08
C1000004
0124
If you want to match only word characters in between the underscores, another option for a pattern could be using a negated character class [^\W_] excluding the underscore from the word characters in between:
^[^\W_]*_[^\W_]*(?:_([^\W_]+))+

Regex matching excluding a specific context

I'm trying to search a string for words within single quotes, but only if those single quotes are not within parentheses.
Example string:
something, 'foo', something ('bar')
So for the given example I'd like to match foo, but not bar.
After searching for regex examples I'm able to match within single quotes (see below code snippet), but am not sure how to exclude matches in the context previously described.
string line = "something, 'foo', something ('bar')";
Match name = Regex.Match(line, #"'([^']*)");
if (name.Success)
{
string matchedName = name.Groups[1].Value;
Console.WriteLine(matchedName);
}

I would recommend using lookahead instead (see it live) using:
(?<!\()'([^']*)'(?!\))
Or with C#:
string line = "something, 'foo', something ('bar')";
Match name = Regex.Match(line, #"(?<!\()'([^']*)'(?!\))");
if (name.Success)
{
Console.WriteLine(name.Groups[1].Value);
}

The easiest way to get what you need is to use an alternation group and match and capture what you need and only match what you do not need:
\([^()]*\)|'([^']*)'
See the regex demo
Details:
\( - a (
[^()]* - 0+ chars other than ( and )
\) - a )
| - or
' - a '
([^']*) - Group 1 capturing 0+ chars other than '
' - a single quote.
In C#, use .Groups[1].Value to get the values you need. See the online demo:
var str = "something, 'foo', something ('bar')";
var result = Regex.Matches(str, #"\([^()]*\)|'([^']*)'")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
Another alternative is the one mentioned by Thomas, but since it is .NET, you may use infinite-width lookbehind:
(?<!\([^()]*)'([^']*)'(?![^()]*\))
See this regex demo.
Details:
(?<!\([^()]*) - a negative lookbehind failing the match if there is ( followed with 0+ chars other than ( and ) up to
'([^']*)' - a quote, 0+ chars other than single quote captured into Group 1, and another single quote
(?![^()]*\)) - a negative lookahead that fails the match if there are 0+ chars other than ( and ) followed with ) right after the ' from the preceding subpattern.
Since you'd want to exclude ', the same code as above applies.

How to change regex to handle _ symbol after a number?

I want to have a regex that can match the following names:
Standard_DS1_v2
Standard_DS2_v2
...
Standard_DS15_v2
Standard_DS1
Standard_DS2
...
Standard_DS14
Standard_GS1
Standard_GS2
...
Standard_GS5
Now I have the following regex:
private const string PremiumStorageRoleSizesRegex = #"^Standard_((DS)|(GS))\d+$";
machineSizeMetadatas = machineSizeMetadatas.Where(metadata => Regex.IsMatch(metadata.Name, PremiumStorageRoleSizesRegex));
But I don't know how to handle the first sequence (Standard_DS1_v2).
How can I change my regex?

You may use this regex with an optional group:
^Standard_(DS|GS)\d+(?:_v\d+)?$
^^^^^^^^^^
Or a simpler ^Standard_[DG]S\d+(?:_v\d+)?$ to avoid alternation group. See the regex demo
You may add a capturing group wherever you need.
Pattern details:
^ - start of string
Standard_ - a literal Standard_ substring
(DS|GS) - either a DS or GS string (you may replace it with [DG]S)
\d+ - 1 or more digits
(?:_v\d+)? - 1 or 0 (=optional) sequences of _ + v + `1 or more digits
$ - end of string.

Parameter regex

I need help with creating regex for validating parameter string.
Parameter string consists of 2 optional groups of explicit characters. First group can contain only one occurrence of P, O, Z characters (order doesn't matter). Second group has same restrictions but can contain only characters t, c, p, m. If both groups are presented, they need to be delimited by a single space character.
So valid strings are:
P t
PO t
OZP ct
P tcmp
P
PZ
t
tp
etc.

Why not ditch regex, and using string to represent non string data, and do
[Flags]
enum First
{
None = 0,
P = 1,
O = 2,
Z = 4
}
[Flags]
enum Second
{
None = 0
T = 1,
C = 2,
P = 4,
M = 8
}
void YourMethod(First first, Second second)
{
bool hasP = first.HasFlag(First.P);
var hasT = second.HasFlag(Second.T);
}
You could then call YourMethod like this.
// equivalent to "PO mp", but checked at compile time.
YourMethod(First.P | First.O, Second.M | Second.P);
or, if you felt like it
// same as above.
YourMethod((First)3, (Second)12);
If you'd like to know more about how this works see this question.

I don't think a regex is a good solution here, because it will have to be quite complicated:
Regex regexObj = new Regex(
#"^ # Start of string
(?: # Start non-capturing group:
([POZ]) # Match and capture one of [POZ] in group 1
(?![POZ]*\1) # Assert that that character doesn't show up again
)* # Repeat any number of times (including zero)
(?: # Start another non-capturing group:
(?<!^) # Assert that we're not at the start of the string
\ # Match a space
(?!$) # Assert that we're also not at the end of the string
)? # Make this group optional.
(?<! # Now assert that we're not right after...
[POZ] # one of [POZ] (i. e. make sure there's a space)
(?!$) # unless we're already at the end of the string.
) # End of negative lookahead assertion
(?: # Start yet another non-capturing group:
([tcpm]) # Match and capture one of [tcpm] in group 2
(?![tcpm]*\2) # Assert that that character doesn't show up again
)* # Repeat any number of times (including zero)
$ # End of string",
RegexOptions.IgnorePatternWhitespace);

This should give you what you need:
([POZ]+)? ?([tcpm]+)?

Basic regex for 16 digit numbers

I currently have a regex that pulls up a 16 digit number from a file e.g.:
Regex:
Regex.Match(l, #"\d{16}")
This would work well for a number as follows:
1234567891234567
Although how could I also include numbers in the regex such as:
1234 5678 9123 4567
and
1234-5678-9123-4567

If all groups are always 4 digit long:
\b\d{4}[ -]?\d{4}[ -]?\d{4}[ -]?\d{4}\b
to be sure the delimiter is the same between groups:
\b\d{4}(| |-)\d{4}\1\d{4}\1\d{4}\b

If it's always all together or groups of fours, then one way to do this with a single regex is something like:
Regex.Match(l, #"\d{16}|\d{4}[- ]\d{4}[- ]\d{4}[- ]\d{4}")

You could try something like:
^([0-9]{4}[\s-]?){3}([0-9]{4})$
That should do the trick.
Please note:
This also allows
1234-5678 9123 4567
It's not strict on only dashes or only spaces.

Another option is to just use the regex you currently have, and strip all offending characters out of the string before you run the regex:
var input = fileValue.Replace("-",string.Empty).Replace(" ",string.Empty);
Regex.Match(input, #"\d{16}");

Here is a pattern which will get all the numbers and strip out the dashes or spaces. Note it also checks to validate that there is only 16 numbers. The ignore option is so the pattern is commented, it doesn't affect the match processing.
string value = "1234-5678-9123-4567";
string pattern = #"
^ # Beginning of line
( # Place into capture groups for 1 match
(?<Number>\d{4}) # Place into named group capture
(?:[\s-]?) # Allow for a space or dash optional
){4} # Get 4 groups
(?!\d) # 17th number, do not match! abort
$ # End constraint to keep int in 16 digits
";
var result = Regex.Match(value, pattern, RegexOptions.IgnorePatternWhitespace)
.Groups["Number"].Captures
.OfType<Capture>()
.Aggregate (string.Empty, (seed, current) => seed + current);
Console.WriteLine ( result ); // 1234567891234567
// Shows False due to 17 numbers!
Console.WriteLine ( Regex.IsMatch("1234-5678-9123-45678", pattern, RegexOptions.IgnorePatternWhitespace));

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Match only the nth occurrence using a regular expression - c#

Try this one MatchCollection matches = Regex.Matches(sInputLine, #"\d{8}"); string sSecond = matches[1].ToString();

Related

Regex to match words between underscores after second occurence of underscore

Regex matching excluding a specific context

How to change regex to handle _ symbol after a number?

Parameter regex

Basic regex for 16 digit numbers

Categories

Resources