How to change regex to handle _ symbol after a number? - c#

I want to have a regex that can match the following names:
Standard_DS1_v2
Standard_DS2_v2
...
Standard_DS15_v2
Standard_DS1
Standard_DS2
...
Standard_DS14
Standard_GS1
Standard_GS2
...
Standard_GS5
Now I have the following regex:
private const string PremiumStorageRoleSizesRegex = #"^Standard_((DS)|(GS))\d+$";
machineSizeMetadatas = machineSizeMetadatas.Where(metadata => Regex.IsMatch(metadata.Name, PremiumStorageRoleSizesRegex));
But I don't know how to handle the first sequence (Standard_DS1_v2).
How can I change my regex?

You may use this regex with an optional group:
^Standard_(DS|GS)\d+(?:_v\d+)?$
^^^^^^^^^^
Or a simpler ^Standard_[DG]S\d+(?:_v\d+)?$ to avoid alternation group. See the regex demo
You may add a capturing group wherever you need.
Pattern details:
^ - start of string
Standard_ - a literal Standard_ substring
(DS|GS) - either a DS or GS string (you may replace it with [DG]S)
\d+ - 1 or more digits
(?:_v\d+)? - 1 or 0 (=optional) sequences of _ + v + `1 or more digits
$ - end of string.

Related

Regex to match words between underscores after second occurence of underscore

so i would like to get words between underscores after second occurence of underscore
this is my string
ABC_BC_BE08_C1000004_0124
I've assembled this expresion
(?<=_)[^_]+
well it matches what i need but only skips the first word since there is no underscore before it. I would like it to skip ABC and BC and just get the last three strings, i've tried messing around but i am stuck and cant make it work. Thanks!
You can use a non-regex approach here with Split and Skip:
var text = "ABC_BC_BE08_C1000004_0124";
var result = text.Split('_').Skip(2);
foreach (var s in result)
Console.WriteLine(s);
Output:
BE08
C1000004
0124
See the C# demo.
With regex, you can use
var result = Regex.Matches(text, #"(?<=^(?:[^_]*_){2,})[^_]+").Cast<Match>().Select(x => x.Value);
See the regex demo and the C# demo. The regex matches
(?<=^(?:[^_]*_){2,}) - a positive lookbehind that matches a location that matches the following patterns immediately to the left of the current location:
^ - start of string
(?:[^_]*_){2,} - two or more ({2,}) sequences of any zero or more chars other than _ ([^_]*) and then a _ char
[^_]+ - one or more chars other than _
Usign .NET there is also a captures collection that you might use with a regex and a repeated catpure group.
^[^_]*_[^_]*(?:_([^_]+))+
The pattern matches:
^ Start of string
[^_]*_[^_]* Match any char except an _, match _ and again any char except _
(?: Non capture group
_([^_]+) Match _ and capture 1 or more times any char except _ in group 1
)+ Close the non capture group and repeat 1 or more times
.NET regex demo | C# demo
For example:
var pattern = #"^[^_]*_[^_]*(?:_([^_]+))+";
var str = "ABC_BC_BE08_C1000004_0124";
var strings = Regex.Match(str, pattern).Groups[1].Captures.Select(c => c.Value);
foreach (String s in strings)
{
Console.WriteLine(s);
}
Output
BE08
C1000004
0124
If you want to match only word characters in between the underscores, another option for a pattern could be using a negated character class [^\W_] excluding the underscore from the word characters in between:
^[^\W_]*_[^\W_]*(?:_([^\W_]+))+

Match only the nth occurrence using a regular expression

I have a string with 3 dates in it like this:
XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx
I want to select the 2nd date in the string, the 20180208 one.
Is there away to do this purely in the regex, with have to resort to pulling out the 2 match in code. I'm using C# if that matters.
Thanks for any help.
You could use
^(?:[^_]+_){2}(\d+)
And take the first group, see a demo on regex101.com.
Broken down, this says
^ # start of the string
(?:[^_]+_){2} # not _ + _, twice
(\d+) # capture digits
C# demo:
var pattern = #"^(?:[^_]+_){2}(\d+)";
var text = "XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx";
var result = Regex.Match(text, pattern)?.Groups[1].Value;
Console.WriteLine(result); // => 20180208
Try this one
MatchCollection matches = Regex.Matches(sInputLine, #"\d{8}");
string sSecond = matches[1].ToString();
You could use the regular expression
^(?:.*?\d{8}_){1}.*?(\d{8})
to save the 2nd date to capture group 1.
Demo
Naturally, for n > 2, replace {1} with {n-1} to obtain the nth date. To obtain the 1st date use
^(?:.*?\d{8}_){0}.*?(\d{8})
Demo
The C#'s regex engine performs the following operations.
^ # match the beginning of a line
(?: # begin a non-capture group
.*? # match 0+ chars lazily
\d{8} # match 8 digits
_ # match '_'
) # end non-capture group
{n} # execute non-capture group n (n >= 0) times
.*? # match 0+ chars lazily
(\d{8}) # match 8 digits in capture group 1
The important thing to note is that the first instance of .*?, followed by \d{8}, because it is lazy, will gobble up as many characters as it can until the next 8 characters are digits (and are not preceded or followed by a digit. For example, in the string
_1234abcd_efghi_123456789_12345678_ABC
capture group 1 in (.*?)_\d{8}_ will contain "_1234abcd_efghi_123456789".
You can use System.Text.RegularExpressions.Regex
See the following example
Regex regex = new Regex(#"^(?:[^_]+_){2}(\d+)"); //Expression from Jan's answer just showing how to use C# to achieve your goal
GroupCollection groups = regex.Match("XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx").Groups;
if (groups.Count > 1)
{
Console.WriteLine(groups[1].Value);
}

Regex containing two specific characters

I have the following regex, which is not working:
#"^[a-z]{1}[a-z0-9\-_(%i)]*$"
The user is allowed to use %i, but only in this combination. Only % is not allowed. The expression in parentheses does not work.
The user input could be for example:
testing123%i
testing123
testing-%i-123
But this is not allowed:
testing%123
A character class only matches 1 single char. You need to take %i out of the character class if you want to match %i as a sequence:
^[a-z](?:[a-z0-9_-]|%i)*$
See the regex demo
Details:
^ - start of a string
[a-z] - a lowervase ASCII letter
(?:[a-z0-9_-]|%i)* - zero or more occurrences of:
[a-z0-9_-] - a lowercase ASCII letter, a digit, _ or -
| - or
%i - a literal char sequence %i
$ - end of string.
string pattern = #"\b(?!(?:.\B)(.)(?:\B.)\1)[%i]+\b";
string input = "testing123%i";
if (Regex.IsMatch(input, pattern))
{
return true;
}

Regex matching excluding a specific context

I'm trying to search a string for words within single quotes, but only if those single quotes are not within parentheses.
Example string:
something, 'foo', something ('bar')
So for the given example I'd like to match foo, but not bar.
After searching for regex examples I'm able to match within single quotes (see below code snippet), but am not sure how to exclude matches in the context previously described.
string line = "something, 'foo', something ('bar')";
Match name = Regex.Match(line, #"'([^']*)");
if (name.Success)
{
string matchedName = name.Groups[1].Value;
Console.WriteLine(matchedName);
}
I would recommend using lookahead instead (see it live) using:
(?<!\()'([^']*)'(?!\))
Or with C#:
string line = "something, 'foo', something ('bar')";
Match name = Regex.Match(line, #"(?<!\()'([^']*)'(?!\))");
if (name.Success)
{
Console.WriteLine(name.Groups[1].Value);
}
The easiest way to get what you need is to use an alternation group and match and capture what you need and only match what you do not need:
\([^()]*\)|'([^']*)'
See the regex demo
Details:
\( - a (
[^()]* - 0+ chars other than ( and )
\) - a )
| - or
' - a '
([^']*) - Group 1 capturing 0+ chars other than '
' - a single quote.
In C#, use .Groups[1].Value to get the values you need. See the online demo:
var str = "something, 'foo', something ('bar')";
var result = Regex.Matches(str, #"\([^()]*\)|'([^']*)'")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
Another alternative is the one mentioned by Thomas, but since it is .NET, you may use infinite-width lookbehind:
(?<!\([^()]*)'([^']*)'(?![^()]*\))
See this regex demo.
Details:
(?<!\([^()]*) - a negative lookbehind failing the match if there is ( followed with 0+ chars other than ( and ) up to
'([^']*)' - a quote, 0+ chars other than single quote captured into Group 1, and another single quote
(?![^()]*\)) - a negative lookahead that fails the match if there are 0+ chars other than ( and ) followed with ) right after the ' from the preceding subpattern.
Since you'd want to exclude ', the same code as above applies.

Regex problems with equal sign?

In C# I'm trying to validate a string that looks like:
I#paramname='test'
or
O#paramname=2827
Here is my code:
string t1 = "I#parameter='test'";
string r = #"^([Ii]|[Oo])#\w=\w";
var re = new Regex(r);
If I take the "=\w" off the end or variable r I get True. If I add an "=\w" after the \w it's False. I want the characters between # and = to be able to be any alphanumeric value. Anything after the = sign can have alphanumeric and ' (single quotes). What am I doing wrong here. I very rarely have used regular expressions and normally can find example, this is custom format though and even with cheatsheets I'm having issues.
^([Ii]|[Oo])#\w+=(?<q>'?)[\w\d]+\k<q>$
Regular expression:
^ start of line
([Ii]|[Oo]) either (I or i) or (O or o)
\w+ 1 or more word characters
= equals sign
(?<q>'?) capture 0 or 1 quotes in named group q
[\w\d]+ 1 or more word or digit characters
\k<q> repeat of what was captured in named group q
$ end of line
use \w+ instead of \w to one character or more. Or \w* to get zero or more:
Try this: Live demo
^([Ii]|[Oo])#\w+=\'*\w+\'*
If you are being a bit more strict with using paramname:
^([Ii]|[Oo])#paramname=[']?[\w]+[']?
Here is a demo
You could try something like this:
Regex rx = new Regex( #"^([IO])#(\w+)=(.*)$" , RegexOptions.IgnoreCase ) ;
Match group 1 will give you the value of I or O (the parameter direction?)
Match group 2 will give you the name of the parameter
Match group 3 will give you the value of the parameter
You could be stricter about the 3rd group and match it as
(([^']+)|('(('')|([^']+))*'))
The first alternative matches 1 or more non quoted character; the second alternative match a quoted string literal with any internal (embedded) quotes escape by doubling them, so it would match things like
'' (the empty string
'foo bar'
'That''s All, Folks!'

Categories