How to match pairs of characters using Regex?

How to match pairs of characters using Regex? - c#

I have a variable containing a string. This string contains only alphabetic and numeric characters. This string also have a fixed length by 32 characters in length. How I can match using regular expressions if this string have only paired characters by length of 2, 4, 8, 16?
For example, for similar to this strings:
abcdefghijklmnopqrstuvwxyz012345
Regex.IsMatch must return false.
But for strings similar to this:
aaaaaaaaaaaaaaaa5555555555555555
this is a 16-characters pairs;
aaaaaaaa55555555aaaaaaaa55555555
this is a 8-characters pairs;
aaaa5555aaaa5555aaaa5555aaaa5555
this is a 4-characters pairs;
aa55aa55aa55aa55aa55aa55aa55aa55
this is a 2-characters pairs -
Regex.IsMatch must return true.

EDIT
Apparently, the requirement is simply to match eg aabbccddeeffgghhiijjkkllmmnnoopp, ie the first two characters must be the same, then the next two etc for exactly 32 characters. That can be easily tested for with:
((\w)\2(\w)\3){8}

This should work (without needing to come up with individual regexes for each possible combination).
public bool isRelevantMatch(string inputString)
{
int matchCount = Regex.Matches(inputString, #"([a-zA-Z])\1{1}").Count;
return matchCount == 1 ||
matchCount == 2 ||
matchCount == 4 ||
matchCount == 8 ||
matchCount == 16;
}
Explanation: get the count of matches of repeated characters (using a backreference regex to match any instance of aa, AA, bb, BB, etc.). If that count is 1, 2, 4, or 8, return true (there are 2, 4, 8, or 16 paired characters in the string).

Late to this but will throw out this.
If you want to do repeating 'unique pairs' this works in Perl.
I tried to make this smaller but couldn't figure out how.
The syntax for Dot-Net is probably the same. However, I've reused the
capture group names, which works in Perl, but not sure about Dot-Net
(should be ok, if not change to unique names).
Also, in Perl, could have used a branch reset to overlay capture groups,
then test a single group length to get the repeat order, but this is not available in Dot-Net.
So, just have to test 4 groups for a match (or length) to get the order.
# (?<A>(?<b>\w)\k<b>(?!\k<b>)(?<c>\w)\k<c>)\k<A>{7}|(?<A>(?<b>\w)\k<b>{3}(?!\k<b>)(?<c>\w)\k<c>{3})\k<A>{3}|(?<A>(?<b>\w)\k<b>{7}(?!\k<b>)(?<c>\w)\k<c>{7})\k<A>{1}|(?<A>(?<b>\w)\k<b>{15}(?!\k<b>)(?<c>\w)\k<c>{15})
(?<A> # (1 start), 2 char pairs, repeating x 8
(?<b> \w ) # (2)
\k<b>
(?! \k<b> )
(?<c> \w ) # (3)
\k<c>
) # (1 end)
\k<A>{7}
|
(?<A> # (4 start), 4 char pairs, repeating x 4
(?<b> \w ) # (5)
\k<b>{3}
(?! \k<b> )
(?<c> \w ) # (6)
\k<c>{3}
) # (4 end)
\k<A>{3}
|
(?<A> # (7 start), 8 char pairs, repeating x 2
(?<b> \w ) # (8)
\k<b>{7}
(?! \k<b> )
(?<c> \w ) # (9)
\k<c>{7}
) # (7 end)
\k<A>{1}
|
(?<A> # (10 start), 16 char pairs
(?<b> \w ) # (11)
\k<b>{15}
(?! \k<b> )
(?<c> \w ) # (12)
\k<c>{15}
) # (10 end)

Related

Match only the nth occurrence using a regular expression

I have a string with 3 dates in it like this:
XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx
I want to select the 2nd date in the string, the 20180208 one.
Is there away to do this purely in the regex, with have to resort to pulling out the 2 match in code. I'm using C# if that matters.
Thanks for any help.

You could use
^(?:[^_]+_){2}(\d+)
And take the first group, see a demo on regex101.com.
Broken down, this says
^ # start of the string
(?:[^_]+_){2} # not _ + _, twice
(\d+) # capture digits
C# demo:
var pattern = #"^(?:[^_]+_){2}(\d+)";
var text = "XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx";
var result = Regex.Match(text, pattern)?.Groups[1].Value;
Console.WriteLine(result); // => 20180208

Try this one
MatchCollection matches = Regex.Matches(sInputLine, #"\d{8}");
string sSecond = matches[1].ToString();

You could use the regular expression
^(?:.*?\d{8}_){1}.*?(\d{8})
to save the 2nd date to capture group 1.
Demo
Naturally, for n > 2, replace {1} with {n-1} to obtain the nth date. To obtain the 1st date use
^(?:.*?\d{8}_){0}.*?(\d{8})
Demo
The C#'s regex engine performs the following operations.
^ # match the beginning of a line
(?: # begin a non-capture group
.*? # match 0+ chars lazily
\d{8} # match 8 digits
_ # match '_'
) # end non-capture group
{n} # execute non-capture group n (n >= 0) times
.*? # match 0+ chars lazily
(\d{8}) # match 8 digits in capture group 1
The important thing to note is that the first instance of .*?, followed by \d{8}, because it is lazy, will gobble up as many characters as it can until the next 8 characters are digits (and are not preceded or followed by a digit. For example, in the string
_1234abcd_efghi_123456789_12345678_ABC
capture group 1 in (.*?)_\d{8}_ will contain "_1234abcd_efghi_123456789".

You can use System.Text.RegularExpressions.Regex
See the following example
Regex regex = new Regex(#"^(?:[^_]+_){2}(\d+)"); //Expression from Jan's answer just showing how to use C# to achieve your goal
GroupCollection groups = regex.Match("XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx").Groups;
if (groups.Count > 1)
{
Console.WriteLine(groups[1].Value);
}

Making a group with spaces between words and decimals

I'm not new to the concept of regex but the syntax and semantics of everything get confusing for me at times. I have been trying to create a pattern to recognize
Ambient Relative Humidity: 31.59
With the grouping
Ambient Relative Humidity (Group 1)
31.59 (Group 2)
But I also need to be able to match things such as
Operator: Bob
With the grouping
Operator (Group 1)
Bob (Group 2)
Or
Sensor Number: 0001
With the grouping
Sensor Number (Group 1)
0001 (Group 2)
Here is the current pattern I created which works for the examples involving operator and sensor number but does not match with the first example (ambient humidity)
\s*([A-Za-z0-9]*\s*?[A-Za-z0-9]*)\s*:\s*([A-Za-z0-9]*)

You have to add more space separated key parts to the regex.
Also, you have to add an option for decimal numbers in the value.
Something like this ([A-Za-z0-9]*(?:\s*[A-Za-z0-9]+)*)\s*:\s*((?:\d+(?:\.\d*)?|\.\d+)|[A-Za-z0-9]+)?
https://regex101.com/r/fl0wtb/1
Explained
( # (1 start), Key
[A-Za-z0-9]*
(?: \s* [A-Za-z0-9]+ )*
) # (1 end)
\s* : \s*
( # (2 start), Value
(?: # Decimal number
\d+
(?: \. \d* )?
| \. \d+
)
| # or,
[A-Za-z0-9]+ # Alpha num's
)? # (2 end)

I may have posted too soon without thinking, I now have the following expression
\s*([A-Za-z0-9]*\s*[A-Za-z0-9]*\s*[A-Za-z0-9]*)\s*:\s*([A-Za-z0-9.]*)
The only thing is that it includes spaces sometimes that I was trying to avoid but I can just trim those later. Sorry for posting so soon!

var st = "Ambient Relative Humidity: 31.59 Operator: Bob Sensor Number: 0001";
var li = Regex.Matches(st, #"([\w]+?:)\s+(\d+\.?\d+|\w+)").Cast<Match>().ToList();
foreach (var t in li)
{
Console.WriteLine($"Group 1 {t.Groups[1]}");
Console.WriteLine($"Group 2 {t.Groups[2]}");
}
//Group 1 Humidity:
//Group 2 31.59
//Group 1 Operator:
//Group 2 Bob
//Group 1 Number:
//Group 2 0001

regex expression help needed

I was wondering if this was possible using Regex. I would like to exclude all letters (upper and lowercase) and the following 14 characters ! “ & ‘ * + , : ; < = > # _
The problem is the equal sign. In the string (which must either be 20 or 37 characters long) that I will be validating, that equal sign must either be in the 17th or 20th position because it is used as a separator in those positions. So it must check if that equal sign is anywhere other than in the 16th or 20th position (but not both). The following are some examples:
pass: 1234567890123456=12345678901234567890
pass: 1234567890123456789=12345678901234567
don't pass: 123456=890123456=12345678901234567
don't pass: 1234567890123456=12=45678901234567890
I am having a hard time with the part that I must allow the equal sign in those two positions and not sure if that's possible with Regex. Adding an if-statement would require substantial code change and regression testing because this function that stores this regex currently is used by many different plug-ins.

I'll go for
^([^a-zA-Z!"&'*+,:;<=>#_]{16}=[^a-zA-Z!"&'*+,:;<=>#_]+|[^a-zA-Z!"&'*+,:;<=>#_]{19}=[^a-zA-Z!"&'*+,:;<=>#_]*)$
Explanations :
1) Start with your allowed char :
^[^a-zA-Z!"&'*+,:;<=>#_]$
[^xxx] means all except xxx, where a-z is lower case letters A-Z upper case ones, and your others chars
2) Repeat it 16 times, then =, then others allowed chars ("allowed char" followed by '+' to tell that is repeated 1 to n times)
^[^a-zA-Z!"&'*+,:;<=>#_]{16}=[^a-zA-Z!"&'*+,:;<=>#_]+$
At this point you'll match your first case, when = is at position 17.
3) Your second case will be
^[^a-zA-Z!"&'*+,:;<=>#_]{19}=[^a-zA-Z!"&'*+,:;<=>#_]*$
with the last part followed by * instead of + to handle strings that are only 20 chars long and that ends with =
4) just use the (case1|case2) to handle both
^([^a-zA-Z!"&'*+,:;<=>#_]{16}=[^a-zA-Z!"&'*+,:;<=>#_]+|[^a-zA-Z!"&'*+,:;<=>#_]{19}=[^a-zA-Z!"&'*+,:;<=>#_]*)$
Tested OK with notepad++ and your examples
Edit to match exactly 20 or 37 chars
^([^a-zA-Z!"&'*+,:;<=>#_]{16}=[^a-zA-Z!"&'*+,:;<=>#_]{3}|[^a-zA-Z!"&'*+,:;<=>#_]{16}=[^a-zA-Z!"&'*+,:;<=>#_]{20}|[^a-zA-Z!"&'*+,:;<=>#_]{19}=|[^a-zA-Z!"&'*+,:;<=>#_]{19}=[^a-zA-Z!"&'*+,:;<=>#_]{17})$
More readable view with explanation :
`
^(
// 20 chars with = at 17
[^a-zA-Z!"&'*+,:;<=>#_]{16} // 16 allowed chars
= // followed by =
[^a-zA-Z!"&'*+,:;<=>#_]{3} // folowed by 3 allowed chars
|
[^a-zA-Z!"&'*+,:;<=>#_]{16} // 37 chars with = at 17
=
[^a-zA-Z!"&'*+,:;<=>#_]{20}
|
[^a-zA-Z!"&'*+,:;<=>#_]{19} // 20 chars with = at 20
=
|
[^a-zA-Z!"&'*+,:;<=>#_]{19} // 37 chars with = at 20
=
[^a-zA-Z!"&'*+,:;<=>#_]{17}
)$
`

I've omitted other symbols matching other symbols and just placed the [^=], you should have there code for all allowed symbols except =
var r = new Regex(#"^(([0-9\:\<\>]{16,16}=(([0-9\:\<\>]{20})|([0-9\:\<\>]{3})))|(^[^=]{19,19}=(([0-9\:\<\>]{17}))?))$");
/*
#"^(
([0-9\:\<\>]{16,16}
=
(([0-9\:\<\>]{20})|([0-9\:\<\>]{3})))
|
(^[^=]{19,19}
=
(([0-9\:\<\>]{17}))?)
)$"
*/
using {length,length} you can also specify the overall string length. The $ in the end and ^ in the beginning are important also.

Parameter regex

I need help with creating regex for validating parameter string.
Parameter string consists of 2 optional groups of explicit characters. First group can contain only one occurrence of P, O, Z characters (order doesn't matter). Second group has same restrictions but can contain only characters t, c, p, m. If both groups are presented, they need to be delimited by a single space character.
So valid strings are:
P t
PO t
OZP ct
P tcmp
P
PZ
t
tp
etc.

Why not ditch regex, and using string to represent non string data, and do
[Flags]
enum First
{
None = 0,
P = 1,
O = 2,
Z = 4
}
[Flags]
enum Second
{
None = 0
T = 1,
C = 2,
P = 4,
M = 8
}
void YourMethod(First first, Second second)
{
bool hasP = first.HasFlag(First.P);
var hasT = second.HasFlag(Second.T);
}
You could then call YourMethod like this.
// equivalent to "PO mp", but checked at compile time.
YourMethod(First.P | First.O, Second.M | Second.P);
or, if you felt like it
// same as above.
YourMethod((First)3, (Second)12);
If you'd like to know more about how this works see this question.

I don't think a regex is a good solution here, because it will have to be quite complicated:
Regex regexObj = new Regex(
#"^ # Start of string
(?: # Start non-capturing group:
([POZ]) # Match and capture one of [POZ] in group 1
(?![POZ]*\1) # Assert that that character doesn't show up again
)* # Repeat any number of times (including zero)
(?: # Start another non-capturing group:
(?<!^) # Assert that we're not at the start of the string
\ # Match a space
(?!$) # Assert that we're also not at the end of the string
)? # Make this group optional.
(?<! # Now assert that we're not right after...
[POZ] # one of [POZ] (i. e. make sure there's a space)
(?!$) # unless we're already at the end of the string.
) # End of negative lookahead assertion
(?: # Start yet another non-capturing group:
([tcpm]) # Match and capture one of [tcpm] in group 2
(?![tcpm]*\2) # Assert that that character doesn't show up again
)* # Repeat any number of times (including zero)
$ # End of string",
RegexOptions.IgnorePatternWhitespace);

This should give you what you need:
([POZ]+)? ?([tcpm]+)?

Regex match 2 alpha plus 6 digits in C#

I need a regex to match this pattern ( using C# )
My match must start with 2 alpha characters ( MA or CA ) and must end with either 6 or seven numeric digits; such as CA123456 or MA123456 or MA1234567
Here is what I tried:
Regex.IsMatch(StringInput, #"^[MA]{2}|^[CA]{2}\d{6,7}?"))
Unfortunately, it seems to match most anything

Try this pattern:
^[MC]A\d{6,7}$
The leading character class ([MC]) requires either an M or a C at the start of the string. Afterwards, \d{6,7} matches either 6 or 7 digits.
The issue with your pattern is the first alternative: ^[MA]{2} matches any string that starts with AA, AM, MA, or MM. It doesn't require any following digits at all. Since the regex engine can match the first alternative for a string like AA1234567 (matching the substring AA), it doesn't even attempt to find another match. This is why
it seems to match most anything.

I believe there are great usages of RegEx; in this particular case, using the built-in string functions of C# may be a better option:
Must start with either MA or CA
Must end with at least 6 digits (if there are 7, then there will be 6 digits)
Combining 1 and 2, the string must be at least 8 characters long
This would be the string version based on the above rules:
public static bool IsValid( string str )
{
if( str.Length < 8 )
{
return false;
}
if( !str.StartsWith( "CA" ) && !str.StartsWith( "MA" ) )
{
return false;
}
int result;
string end = str.Substring( str.Length - 6 );
bool isValid = int.TryParse( end, out result );
return isValid;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to match pairs of characters using Regex? - c#

EDIT Apparently, the requirement is simply to match eg aabbccddeeffgghhiijjkkllmmnnoopp, ie the first two characters must be the same, then the next two etc for exactly 32 characters. That can be easily tested for with: ((\w)\2(\w)\3){8}

Related

Match only the nth occurrence using a regular expression

Making a group with spaces between words and decimals

regex expression help needed

Parameter regex

Regex match 2 alpha plus 6 digits in C#

Categories

Resources